MIT Concurrent VLSI Architecture Memo 40 



Massachusetts Institute of Technology 
Artificial Intelligence Laboratory 



MDP Programmer's Manual 



Michael Noakes* 



Abstract 

The Message Driven Processor, MDP, is a VLSI implementation of an integrated processor core for 
fine-grained parallel computers. The processor is to be used to construct an experimental massively par- 
allel computer; the MIT J-Machine. Primitive mechanisms are embedded to support a wide variety of 
programming models including a message passing organisation, shared memory applications, data-parallel 
programming, and dataflow applications. Target applications include CAD design and simulation, physical 
modeling, transaction processing, graphics rendering, parallel language design, architectural evaluation. 

This document attempts to address the needs of a wide range of potential readers. The early chapters 
focus on the requirements of new users with little exposure to the principles and motivations of the MDP or 
the J-Machine parallel computer. The later sections focus on details of the chip; the format of the registers, 
special features and so. The document concludes with the use of the current assembler MDPSim and examples 
of code fragments to clarify the discussions. It is also intended to serve as a complete reference manual for the 
MDP and as such supersedes the previous such document, "Message-Driven Processor Architecture Version 



Keywords: Processor Architecture, VLSI, Parallel Processing, Message Driven Processor, Fine-Grain, 
Networks, Cache, Concurrent Smalltalk. 



l This report describes research don* si the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. 
The research described in this paper was supported in part by the Defense Advance Research Projects Agency of the Department 
of Defense under contracts N 00014-80- C-0622 and N00014-8S-K 01 24 and in part by a National Science Foundation Presidential 
Young Invartigator Award with matching funds from General Electric Corporation and TBlf Corporation. 

'Include* portions of CVA memo 14 



Contents 



1 ARCHITECTURAL OVERVIEW 1 

1.1 OVERVIEW 1 

1.2 INTEGER CORE UNIT 3 

1.3 NETWORK INTERFACE 3 

1.4 NAME CACHE 3 

1.5 DIAGNOSTIC INTERFACE 4 

1.6 SOFTWARE DEVELOPMENT 4 

1.7 THE PROTOTYPE J-MACHINE 4 

2 PROGRAMMING MODELS OF THE MDP 5 

2.1 THE MESSAGE PASSING MODEL 5 

2.2 THE NODE ID 5 

2.3 OBJECTS 6 

2.4 MESSAGES 6 

2.5 PRIORITIES 7 

2.6 TYPES 7 

2.6.1 Symbol 7 

2.6.2 Integer 7 

2.6.3 Boolean 7 

2.6.4 Address 8 

2.6.5 Instruction Pointer 8 

2.6.6 Message 8 

2.6.7 Context Future 8 

2.6.8 Future 8 

2.6.9 User Denned 9 

2.6.10 Instruction Type 9 

2.7 THE DATA AND ADDRESS REGISTERS 9 

2.8 HANDLING A SIMPLE MESSAGE 9 

2.9 AO-RELATIVE ADDRESSING 11 

2.10 AS-RELATIVE ADDRESSING 12 

2.11 FAULTS 12 

2.11.1 Asynchronous Faults 12 

2.11.2 Calls 13 

2.11.3 Unchecked Mode 13 

2.12 NETWORK INTERFACE 13 

2.12.1 Message Queues 13 

2.12.2 Message Reception 14 

2.12.3 Suspend 14 

2.12.4 Message Transmission 14 

2.12.5 Faults Associated with the Network 14 

2.12.6 Initialising the Network 15 

2.13 EXTERNAL MEMORY INTERFACE 16 

2.14 DIAGNOSTIC INTERFACE 16 

3 REGISTERS AND MEMORY 17 

3.1 DATA REGISTERS 17 

3.2 ADDRESS REGISTERS 17 

3.3 INSTRUCTION POINTER 18 

3.4 FAULT REGISTERS 18 

i 



3.5 NETWORK CONTROL REGISTERS 

3.6 NAME CACHE CONTROL 

3.7 ID REGISTERS 

3.8 MEMORY ADDRESS REGISTER . . 

3.9 PROCESSOR STATUS FLAGS . . . 
3.9.1 Background Priority 

3.10 MEMORY MAP 

3.10.1 Priority Switchable Memory . . 

3.11 EXCEPTIONS 



3.11.1 Reset 22 

3.11.2 Fault Processing 22 

3.11.3 System Calls 22 

3.11.4 Interrupts 22 

3.11.5 Fault Types 23 

4 INSTRUCTION SET 25 

4.1 INSTRUCTION ENCODING AND ADDRESS MODES 25 

4.1.1 Normal Addressing Mode 25 

4.1.2 Register Oriented Addressing Mode 27 

4.1.3 Instruction Row Buffer 27 

4.2 MOVE AND TYPE OPERATIONS 27 

4.3 ARITHMETIC OPERATIONS 28 

4.4 LOGICAL OPERATIONS 29 

4.5 COMPARISON OPERATIONS 29 

4.6 BRANCH OPERATIONS 30 

4.7 NETWORK OPERATIONS 30 

4.8 SPECIAL INSTRUCTIONS 30 

4.9 NAME CACHE OPERATIONS 31 

4.10 MDP BUGS 31 

5 PROGRAMMING EXAMPLES 33 

5.1 THE FORM OF A TYPICAL PROGRAM 33 

5.2 INITIALIZING THE MDP 34 

5.3 INITIALIZING THE NNRs 37 

5.4 ACCESSING OTHER PRIORITIES 43 

5.5 LONG JUMPS 46 

6 Appendix A 47 



ii 



Chapter 1 

ARCHITECTURAL OVERVIEW 



The Message Driven Processor, MDP, is a VLSI implementation of an integrated processor core for line- 
grained parallel computers. The processor is to be used to construct an experimental massively parallel 
computer; the MIT J- Machine. Primitive mechanisms are embedded to support a wide variety of program- 
ming models including a message passing organisation, shared memory applications, data-parallel program- 
ming, and dataflow applications. Target applications include CAD design and simulation, physical modeling, 
transaction processing, graphics rendering, parallel language design, architectural evaluation. 

This document attempts to address the needs of a wide range of potential readers. The early chapters 
focus on the requirements of new users with little exposure to the principles and motivations of the MDP or 
the J-Machine parallel computer. The later sections focus on details of the chip; the format of the registers, 
special features and so. The document concludes with the use of the current assembler MDPSim and examples 
of code fragments to clarify the discussions. It is also intended to serve as a complete reference manual for the 
MDP and as such supersedes the previous such document, "Message- Driven Processor Architecture Version 
11". 

1.1 OVERVIEW 

The Message Driven Processor includes on a single chip: 

• 32 bit integer core 

• On-chip 4 Kword static ram 

• Three dimensional router and integrated network interface 

• Multiple register banks for fast context switching among three priorities of execution 

• Tag support for runtime type checking and for synchronisation primitives 

• Fast trapping mechanism for exceptional situations 

• Segment-based address unit 

• Name cache for the support of a global name space 

• Support for up to 1 Mword of external DRAM including Error Correction coding, dynamic refresh, 
and both static-column and page-mode memory devices. 

• Low-level diagnostic interface for system initialisation 

To minimise contention for the shared on-chip static ram, the major modules read and write 4 words of 
this memory in a single cycle. The core arithmetic and logical instruction operate in a single clock cycle. 
Multiple cycle operations are controlled by a hardware state machine controller to free the programmer 
from the need to monitor pipeline dependencies in software. The instruction set of the processor follows 
conventional practices for streamlined execution while balancing the need for increased code density on 
fine-grained processing nodes. 

As the name suggests, the major activities of the MDP are scheduled by the reception of messages from 
within the J-Machine. Messages are routed through to a neighboring node or queued for the current node 
without intervention of the integer core. When the integer processor completes the operations associated 
with the current thread, the next message is dispatched efficiently under hardware control. Messages can 



1 



32 MBytes/sec 

K 



Network Interface 



TlefTn 



ZDim 



YOim 



XDim 



NetOut 



Send 
Buffers 



NNR 



Register File 



1 BKG 


1 P1 




PO 






RO - R3 






AO - A3 






IP 






FIP 






FOPO FOP1 






FIR 






IDO - ID3 







On-Chip SRAM 4K x 36 bits 



PI 



PO 
Scratch 



PI 



PO 

Fault 
Vector 



QBMP1 



QBMPO 



PI 



PO 

Message 
Queue 



Call 
Table 



TBM 



Name 
Cache 



Program and Data 



Integer Core 

4 bit tag + 32 bit data 



EMEM Interface 




ECC control 




DRAM refresh 


< > 


Static Column 


14 MBytes/sec 


or 




Page Mode 





Diagnostic 
Interface 



1 MByte/sec 



Figure 1.1: Major Chip Modules. 



2 



be processed at one of two priorities and independent register banks provide rapid transitions between the 
priorities. In addition, there is a background priority that executes whenever the two message queues are 
empty. The major functional blocks are indicated in figure 1. Typical data rates are indicated for the main 
external interfaces assuming a I6MH1 processor clock. These values are provided for expository purposes 
and do not necessarily correspond to the final operational clock rate of the J-Machine. 



1.2 INTEGER CORE UNIT 

The integer core includes both the arithmetic and logical execution unit and the addressing unit that supports 
a base- and- bounds segment addressing protection scheme. 
The instruction set includes 



• Arithmetic and logical operations on operands in the general purpose register file; additionally one of 
the source operations can be from memory. Load and store operations between memory and the data 
registers. 

• Transfers between the data registers and the address registers, the network control registers, the name 
cache control registers, and the status registers. It is also possible to transfer operands between registers 
in different priorities. 

• Flow of control instructions for both short and long offsets, and an efficient call mechanism for low-level 
primitive routines. 



The integer processor operates on tagged data; every word of the machine is tagged with a 4 bit value that 
indicates the type of the data stored there. Although these type tags are always monitored by the processor 
core, a section of code may inhibit the triggering of tag exceptions at the discretion of the programmer. In 
this way, the programmer may choose to perform a sequence of "non-standard" operations on the tagged 
values without regard to the standard type error processing. This mode of execution will normally be selected 
for reasons of efficiency within the low-level core of a systems oriented programming application. Tags are 
also used to provide fast trapping for two primitive synchronisation types supported by the hardware. 

In addition to the type checking performed by the hardware, structure references are normally bounds- 
checked by the processor. A physical address consists of a base and a length and indexed accesses into the 
vector defined by this tuple are bounds-checked in parallel with the initiation of the access. 



1.3 NETWORK INTERFACE 



The network interface supports the injection, transmission, and reception of messages within the machine. 
Messages are injected using a set of SEND instructions. These instructions implement all of the standard 
addressing modes of the instruction set thereby simplifying the injection of message arguments. Messages 
are transmitted across the network independently of the processor core using the network routers integrated 
into every MDP. A message may be injected at one of two priorities. Messages at the higher priority 
receive preferential routing and make use of independent buffers. When a message reaches its destination, 
it is added to the tail of a queue associated with the priority of the message. Messages are scheduled for 
execution efficiently by the hardware whenever the associated priority is ready and certain special control 
flags permit. 

The network control registers and queue memory is directly accessible by user code. The programmer 
may control the location and length of both message queues. Each queue may be up to 1024 words and must 
be located in on-chip memory. The general trap mechanism is used to support flow-control, by inhibiting the 
injection of messages when the network in congested, and to trap to system code to handle queue overflows. 



1.4 NAME CACHE 



A portion of the on-chip memory may be configured as a general purpose translation table. . Data is entered 
explicitly with an associated key and, in the absence of later hash collisions, may be extracted efficiently 
using that key. One of the more obvious applications of this mechanism is the support of a global namespace 
for objects in the machine. This table is implemented as a two-way set associative cache with explicit enter 
and lookup operations. It can be configured to contain up to 512 key-value pairs. 



3 



1.5 DIAGNOSTIC INTERFACE 



The diagnostic interface is an external interface between the processor core and the host machine. It will 
be discussed in this manual as it is critical to the initiation and execution of core programming systems. It 
supports the ability to download the initial program core, to inspect the contents of memory, to start and 
to stop the execution of the processor. 

1.6 SOFTWARE DEVELOPMENT 

There are currently two primary programming environments for the J-Machine; assembly language and 
Concurrent Smalltalk (CST). Other systems are under development including Concurrent Aggregates and 
Dataflow computation. There are two simulators for small assemblies of processors; the instruction-level 
simulator MDPSim, and the register-transfer level simulator mdp.test. An interpretive 'scripting* language 
and monitor, Jmon, provides uniform interactive use of a J-Machine or of the simulators. 

1.7 THE PROTOTYPE J-MACHINE 

The initial prototype of the J-Machine parallel computer will consist of at least 1024 MDP's connected by 
a high-performance 3 dimensional mesh network. Each processing node can include up to 1 Mword, 36 bit 
words, of private memory; for the initial J-Machine this is currently limited to 260 K words. The machine 
will include additional nodes that support peripherals such as graphics displays and disk drives. 



4 



Chapter 2 



PROGRAMMING MODELS OF 
THE MDP 

One of the difficulties of describing the organisation and operation of the Message Driven Processor is that 
it has been designed to be a platform for implementing a range of parallel programming models in the 
context of a general purpose fine-grained parallel computer. The design seeks to identify and implement core 
mechanisms that are expected to be valuable to the efficient operation of many of these possible programming 
models. The design decisions have evolved in a manner that tends to shape the lowest levels of these systems 
into a particular form of interaction, but care has been taken to ensure that sufficient flexibility has been 
retained to allow explorations in the neighborhood of this style. 

A consequence of this flexibility is that the processor includes several modes of operation, and that the 
personality of the program becomes subtly different as the user selects among the modes. Typical programs 
will use each of the modes at different times during the execution and therefore most users need to understand 
the use of the major control registers and flags. 

The different characters of the processor complicates an introductory discussion of the features provided 
because there seem to be so many exceptions to be described and appreciated. Even the apparently simple 
task of defining the operation of the ADD instruction is hampered by the different behaviors elicited by the 
current mode of operation. Rather than attempting to simply enumerate the list of registers, instructions, 
and modes while noting the different behaviors obtained in each mode, we will discuss the major styles of 
operation and the manner in which the basic mechanisms support these programs. The initial discussion 
will focus on the standard low-level programming model that the processor was designed to support. The 
alternative behaviors will be described in detail later. Chapter 3 summarises, integrates, and unifies the 
description and should be used as the programmer's reference once the basic operation is understood. 

2.1 THE MESSAGE PASSING MODEL 

The Message Driven Processor includes a numbet of mechanisms designed to assist in the development and 
debugging of higher-level programming systems on a fine-grained parallel computer. Fundamental among 
these is the tight coupling between the processor core and a high performance communication network, 
efficient hardware dispatching of threads, use of tagged words to provide dynamic type-checking and to 
support synchronisation primitives, the use of segment-descriptors to provide bounds checked access to 
memory 'objects 1 , the provision of fast-trapping to user/system code to support extended data types and 
exceptional c ases, and rapid context switching. 

It is important to note that the MDP is designed as a fine-grained node in a parallel computer. This means 
that it should be possible to build a complete processing node from just a few chips, thereby allowing many 
nodes to be placed in a small volume, and that the processor be able to execute sequential threads of just 50 - 
100 instructions. A typical application is expected to partition the data set into many thousands of relatively 
small objects that can be distributed across the nodes of the machine thereby enhancing opportunities for 
parallel access. 

2.2 THE NODE ID 

Every processor in the J-Machine has a unique node number that is assigned during system initialisation. 
The node id is a 16 bit integer that indicates the <x y s> cartesian coordinates of the node within the cube. 
The representation supports up to 32 nodes in each of the x and y dimensions, and up to 64 planes in the 



5 



i dimension. The node number is stored in the Node Number Register, NNR, of the MDP. It must also be 
'programmed' into the routers during system initialisation as described later. 

2.3 OBJECTS 

It is important to appreciate that the use of the words 'object* and 'message' to describe the operation of 
the MDP do not refer to precisely the same concepts as these same words when applied to certain higher- 
level languages such as C++, Smalltalk, or CLOS. There is certainly a common thread to the ideas, and 
the mechanisms supported by the MDP have been influenced by the style of interaction advanced by this 
increasingly popular model of program organisation. However, the use of the terms on the MDP necessarily 
refers to more primitive mechanisms. 

The term object, or segment, or vector, or structure, is simply a bounded sequence of tagged words. 
Access to an object is achieved through an address, or segment descriptor, located in one of several special 
purpose registers on the MDP. An address indicates both the base and length of the object. All accesses are 
bounds checked and a trap to a user specified routine is invoked if the bounds are exceeded. 

A runtime system may support the relocation of certain classes of objects. This might consist of copying 
the object to another location within the current processor or may involve the migration of the object from 
one MDP to another. In such a system, it will not be appropriate to pass physical addresses of the objects 
among the nodes or even to store an address within a single processor for an extended time. Instead it is 
preferable to generate a name, or handle, for the object and to use this in all exchanges. System tables are 
then used to translate the name to a physical address as needed. One common scheme is to dictate that 
every object is owned by a particular node; the home node. The node number of the owner is combined 
with a unique index among all the objects owned by that node as the name of the object. The home node 
is then responsible for knowing the current location of its objects and forwarding requests if the object has 
been relocated to another node. 

There are many possible enhancements on the use of objects. It is desirable to copy certain immutable 
objects, e.g. code blocks, to other nodes thereby increasing the parallel bandwidth to the object and to 
enhance locality. It is also possible to create a distributed object. This type of object is no longer a simple 
vector of words but is instead partitioned into a number of constituent parts which may be distributed 
around the machine. Once again, the goal is to increase the opportunity for parallel access to the object. 

2.4 MESSAGES 

All communication and coordination among the processing nodes occurs as a result of the exchange of 
messages among them. Sending a message from one node to another has two effects; it communicates data 
between the nodes, and it schedules work to be done on the target node. A message is a small data object 
consisting of a header word that indicates the initial entry point for the code to be executed and a number 
of arguments that will be needed by the computation. The length field of the header word of the message 
also indicates the number of arguments supplied. This field is used to validate all accesses to the arguments 
of the message. The request might be something simple such as to lookup the value of a slot in a data object 
stored in the memory of the target node, or it may initiate an extensive sequence of calls and secondary 
message sends. 

Consider the simpler case just mentioned; a message sent to request the value of a slot in a data object. 
This message might be of the following form 



MSG 


GET-SLOT 5 


ID 


NAME OF OBJECT 


INT 


OFFSET 


ID 


OBJECT FOR RESULT 


INT 


OFFSET 



The first word of the message contains the physical address of the start of the code within the memory 
of the target processor, and the length of this message. The second word, the first argument, is a reference 
to the object. This could be the physical address of the start of the object in some simple programming 
systems, but in general we will think of this as a unique name for the object known to both the source of 
the message and the target. This name will be translated into a physical address by the destination. The 
second argument is simply the onset desired within the object. The third and fourth arguments indicate 
where the result should be sent. 



6 



Messages sent between nodes travel in a deterministic path between the nodes until the destination is 
reached. This path is so-called "Manhattan Routing". Messages travel in the x-dimension, then the y- 
dimension, and finally the i-dimension. There are two possible priorities for a message; priority sero and 
priority one. The priority is assigned by the sending node and priority one messages receive preference during 
routing. When the message reaches the destination it is stored in the appropriate message queue by the 
network interlace without intervention by the integer core. The message queues are implemented as circular 
buffers. The sise and location of each queue is determined by the i ni t iali sation routines and c an be altered 
at runtime if necessary although this is not expected to be common. 

2.5 PRIORITIES 

The processor can execute instructions in one of three possible priorities; the background, priority sero, and 
priority one. Many of the machine registers are replicated for each priority thus permitting the processor to 
switch rapidly among these priorities without saving state. 

The processor is initially in the background. If a message arrives at priority sero or priority one, the 
processor will switch from the background to the appropriate priority. The instruction pointer will be set 
from the header word of the message and one of the address registers, explained in a moment, will be set to 
point to the message. The processor will then begin executing instructions. If the processor is executing at 
priority sero and another priority sero message arrives, it will be added to the priority sero message queue. 
When the processor completes the current message, by executing the SUSPEND instruction, the message 
will be removed from the queue and the next message will be processed. If the processor is executing at 
priority sero and a priority one message is received, the processor will switch to priority one and run the 
new message. When that messages completes, the MDP will resume the priority sero message. 

The current priority is indicated by the state of 2 bits; the B-bit and the P-bit. The B-bit is set when 
the MDP is executing in the background. If the B-bit is clear and the P-bit is clear then the processor is in 
priority sero. If the P-bit is also set then the processor is in priority one. 

2.6 TYPES 

The Message Driven Processor incorporates explicitly tagged data values in both memory and the processor 
core. Words in memory and in most of the machine registers consist of 36 bits; 32 bits represent the value, 
and 4 bits indicate the type. There are only 13 distinct types; four of the possible types are proper subsets 
of the instruction type. 

2.6.1 Symbol 

A symbol contains an atomic value. Symbols can only be compared with each other to determine whether 
they are the same symbol or not. There is a special symbol NIL in which the data portion is all seros. 

3 3 3 

5 2 1 0 

0 0 0 0 | Value(0 = NIL) | SYM 

2.6.2 Integer 

This type is the conventional 2's complement integer in the range — 2 ai to 2* 1 — 1. A rail set of arithmetic, 
logical, and comparison operations are denned on integers. 

3 3 3 

5 2 1 _____ 0 

10 0 0 1 | Two's Complement Value | INT 

2.6.3 Boolean 

There are two members of this set; TRUE and FALSE. They are distinguished by the least significant bit in 
the word. All the other locations should be sero. This type most commonly arises as the result of comparison 
operations. They may also be used as operands to logical operations. 



7 



3 3 3 
5 2 1 
|0 0 0 1 |0 



1 0 

0~[d~| BOOL 



2.6.4 Address 

The address type contains a base/length pair and two single bit fields. Indexed accesses relative to an 
address value are bounds checked against the length field of the word. If the length field is sero, then bounds 
checking is not performed. The 20 bit base field supports a 1 Mword address space, and the 10 bit length 
field permits bounds checked segments of between 1 and 1023 words. A length field of sero indicates an 
object of unspecified length. The r and i bits are explained later. 



3 3 3 3 2 1 

5 2 1 0 9 09 0 

|0 0 1 1 EE | base | length \ ADDR 

2.6.5 Instruction Pointer 

This data type is appropriate for loading into the instruction pointer register either directly or as the result 
of an exceptional situation or call. The offset field is analogous to the base field of an address, i.e. provides 
for accesses into the lMword address space of the MDP. The other fields are explained in more detail later. 



3 3 3 2 
2 10 9 



0 9 8 7 



0 1 0 0|nlH 



jffset 



EE 



2.6.6 Message 

This type is stored in the first word of a message. The upper fields of the header are loaded directly into the 
instruction pointer as part of the dispatch mechanism. The length field indicates the length of the message 
including the header word itself. The u and f bits are explained later. 



3 3 3 3 2 1 

5 2 1 0 9 09 0 

0 1 0 1 |u |f | offset | length "| MSG 

2.6.7 Context Future 

This is a special marker used to support efficient synchronisation of parallel tasks. It is stored into structures 
prior to forking a parallel thread and, in normal use, forces an exception if the location is read again before 
the expected value has been determined. The trapping behavior is designed to prevent cfut-tagged values 
from migrating to another node or being stored in general data-structures. 



3 3 3 

5 2 1 0 

0 1 1 0| user-defined | CFUT 

2.6.8 Future 

This is a more general synchronisation marker. It is expected to be used to indicate a first class data 
structure; one that can be stored in structures, passed as an argument to a function, and so on. 

3 3 3 

5 2 1 0 

0 111| user-defined | PUT 



8 



2.8.9 User Defined 

The types TAGS, TAG 9, TAGA, TAGB indicate user denned data types of unspecified format. Primitive 
operations on these types provide a direct trap to a unique handler for each type. 



3 3 3 



1 


0 


0 


0 


user-defined 


TAG8 


1 


0 


0 


1 


user-defined 


TAG 9 


1 


0 


1 


0 


user-denned 


TAGA 


1 


0 


1 


1 


user-defined 


TAGB 



2.6.10 Instruction Type 

The types INSTO, INST1, INST2, INST3 represent instruction types. This representation permits two 17-bit 
instructions to be stored in each word by overlapping the low two bits of the type with the high two bits of 
the first instruction. 



3 3 3 
5 4 3 



1 1 
7 6 



1 1 


0 0 first instruction 


second instruction 


1 1 


0 1 first instruction 


second instruction 


1 1 


1 0 first instruction 


second instruction 


1 1 


1 1 first instruction 


second instruction 



2.7 THE DATA AND ADDRESS REGISTERS 

The MDP contains four general data registers, R0-R3, and four address registers, AO- A3, for each of the 
three priorities. The four general data registers are 36 bits long and can store any object of any type. RO is 
used for the creation of long constants and is therefore a little less useful for storing computation temporaries 
than the other three registers. 

The four address registers only store addresses. Although they are effectively 36 bits long, the upper four 
bits always read as AD DR. It is generally considered to be an error to attempt to write a non-address type 
into an address register. An attempt to do so will either cause a trap or simply be ignored depending on the 
mode of the processor. 

When a message is dispatched, address register A3 is set to the start of the message. This allows 
the code to read the arguments contained in the message. It is possible to use A3 for other purposes if 
the computation no longer requires the contents of the message. However this requires certain flags to be 
adjusted appropriately as will be discussed later. 

Address register AO is used as the base register for instruction fetching and may be used for reading and 
writing data. The special uses of this register will be described in more detail later. 



2.8 HANDLING A SIMPLE MESSAGE 

Consider again the case in which a message is sent from one node to another to request the contents of a 
particular slot in an object. The message will be stored in the appropriate message queue, PO or PI, on 
the target processor until the MDP is ready to execute it. At that point, address register A3 is initialised 
so that it points to the next message in the message queue, the length field is extracted from the header of 
the message and loaded into the length field of A3, and the instruction pointer is loaded from the physical 
address stored in the header. The code to be executed might be of the following form: 

get-slot: ;;; Extract the value from the object 

move [1 , a3] , r2 ; Fetch the name of the segment into r2 

■ove [2, a3] , r3 ; Fetch the offset into r3 

zlate r2, 0, al ; Translate the naae to the physical address 

move Cr3 , al] , r3 ; Fetch the data at the required offset 



i > » 



Generate the header word of the message, an immediate constant 
dc MSG: (reply. value « addr.base.pos) I 4 



9 



; ; ; Extract the name of the object waiting for the result 
move [3 , a3] , ri ; Fetch the name of the receiver 



; ; ; We can use this value directly but ve demonstrate bit-masking here 
wtag rl t IIT, r2 ; Convert it to an integer 

and r2, IFFFF, r2 ; Mask off the low 16 bit IIR 



i » » 



Send 



the reply at priority zero 
r2, rO, 0 
rl, 0 
[4, a3], 0 

r3, 0 



send2 
send 
send 
sends 



; ; ; Done 
suspend 



This code has been written using the syntax of the assembler that is incorporated into the MDP instruc- 
tion level simulator MDPSim. Later sections of this document describe the instruction set of the MDP more 
fully and the use of MDPSim. 

The first two move operations extract the name of the object and the onset within it and store them 
in the general registers R2 and R3. The xlate operation performs an associative lookup within the local 
name cache to translate the name to the physical address of the object. If the name is not found in the 
cache, either because the name is bad or because the cache entry has been reused, the MDP will trap to 
the xlate-fault handler. This handler will then attempt to resolve the name using a set of system tables. 
The handler may choose to use the instruction's second argument, a constant integer between 0 and 3, to 
assist in the search for the key. In either case, if the name can be translated, the result will be placed in 
Al. Finally the value is fetched and stored in R3. Note that this instruction will fault if the specified onset 
is too large for the referenced object. This error might cause the processor to send an error warning to the 
console and then idle to permit the programmer to inspect the state of the MDP at the time of the error. 

Generating a reply message consists of SENDing a sequence of words to the network interface. A strength 
of the MDP is that the send instructions are normal operations in the processor that can utilise all the 
standard addressing modes of the instruction set. The first word to be sent is the NNR of the destination. 
The remaining words are the header and then arguments of the message. The instruction set allows short 
integers and certain special constants to be generated and moved into any general register in one instruction 
but long constants cannot be formed as easily. Instead the DC instruction is used to define an inline long 
constant and the resulting value is stored in RO. In this case we have formed a long constant whose tag field 
is the type MSG, and that contains the physical address of the reply code and the length of this message in 
the appropriate fields. 

We assume that an object ID consists of the home node number in the low 16 bits of the name, and an 
integer unique to that object on that node in the high 16 bits as discussed earlier. This format is designed 
to permit it to be used directly as the destination since the send instruction ignores all but the low 16 bits 
in the first word for a message and only traps when the data is of type CFUT. However, the code shown 
demonstrates the use of bit-masking for expository purposes. First the name is converted to be of type INT 
using the wtag instruction, to avoid a type-fault when the bit-masking is done. We are careful to leave the 
name in Rl and compute the node number in R2. We note that the wtag instruction and the bit wise- and 
instruction have a similar format. The first operand is a general data register as is the destination. The 
second argument is more flexible. For the wtag instruction we have exploited the ability to refer to any small 
constant in the range -16 to 15 directly. The AND instruction has exploited the ability to refer to one of the 
8 special immediate constants. 

The destination node number and the message header, in R2 and RO respectively, are sent to the network 
interface at priority 0 using a single instruction. The next instruction sends the name of the object awaiting 
the result value and the offset where the result should be written. The final send instruction transfers 
the value itself and indicates that there are no more arguments. The processor then executes the suspend 
instruction which pops the message from the queue and permits the MDP to initiate the next available 
message. 

Note again that the header of the message must contain a physical address on the target processor. In 
general this is achieved by ensuring that all the message handlers are placed in exactly the same place on 
every node. Beware of a common error in which two programs that are almost the same and have the same 
set of labels, but at slightly different offsets, are placed on two different nodes. It is also possible to develop 
a system in which a few core routines are common to all the nodes and in which nodes that need to establish 



10 



access to specialised routines can pass the needed addresses dynamically during initialisation. Ultimately, 
changes to the linker can be used to support repeated but specialised needs. In practice, this issue is rarely 
the problem that one might imagine so long as one is aware of the problem. 

In this very simple example, we have encountered many of the basic features of the MDP. We have seen 
the handling of a simple message, have seen that typical instructions can use arguments stored in a general 
register, in an object pointed to by an address register, or a special constant. We have seen possible situations 
in which a trap can occur. 

2.9 AO-RELATIVE ADDRESSING 

We have noted that the MDP supports the use of relocatable objects and have even specified that code blocks 
can also be objects, and yet the example message handler just discussed clearly did not take advantage of 
this feature. In fact we went to great lengths to explain that one had to exercise appropriate care to ensure 
that the correct physical address was stored in the message header. We will now see how to exploit the use 
of relocatable code blocks and AO-relative addressing to enhance the use of message handlers. 

It is expected that many 'interesting' applications will contain significantly more code than can be stored 
at one time on a single node. It is then necessary that the programming system be able purge local memory 
of user code that is no longer likely to be needed and copies of the required blocks be brought to the node. In 
such a system, the majority of the code blocks are implemented as objects. The lowest level of the runtime 
system, which is identical on all nodes, translates the name of a function to a pointer to the current location 
of the code object and then jumps to the start of the code. Consider a user-level function to compute the 
dot-product of two vectors. This function is to be invoked on a node by sending a message of the form 



MSG 


APPLY-FUNCTION 6 


ID 


DOT-PRODUCT 


ID 


VECTOR 1 


ID 


VECTOR 2 


ID 


CONTEXT FOR RESULT 


INT 


OFFSET 



When this message is received, the kernel code for apply-function is called. This translates the name 
DOT-PRODUCT into a pointer to the code block. If the code block is not currently available on this node, 
the runtime system will request that a copy be sent to the node and then suspend the task. When the copy 
arrives, the function application will proceed. 

We have already noted that AO is used as the base of all instruction fetches and can also be used in data 
reads and writes but this did not appear to play a role in the first example. It will explicitly do so in this 
case. Register AO can be used in two ways; aO-absolute and aO-relative. In the former mode, any reference 
through AO, whether implicitly as for instruction fetching or explicitly when used to fetch an instruction's 
operand, ignores the actual value in AO and treats it as though it points to a segment at the start of memory 
and of indefinite length. In this mode the offset stored in the instruction pointer is the absolute value of the 
next instruction word to be fetched. If the user selects AO-relative mode, then AO points to the beginning 
of a code block and the offset in the IP is a small integer relative to the beginning of the code block. When 
a message is dispatched, the processor is set to AO-absolute mode and the offset stored in the header of the 
message is stored in the IP. Let us examine a representative version of the code to implement apply-function. 

apply-function : 

; ; ; Gat the pointer to the coda block 
■ova [1 , a3] , xO 

xlata rO, 0, aO 

;;; Jump to tha start of the coda within tha coda object 
dc ip:(Foffset « ip.of f set.pos) 

nova rO, ip 

This code is very close to the code used by the Concurrent Smalltalk runtime system COSMOS. The 
first two instructions set AO to point to the code object. If this object is not present, the xlate-fault handler 
will request a copy of the code and then suspend this task. Altering AO does not affect the execution of the 
current code because its value is being ignored for both instruction and data accesses. The third instruction 
prepares a pointer of type IP with an offset to the start of the code within the code object i.e. a small integer 
offset based on the data format of a code object. The aO-absolute flag is clear in this new word and so, when 
it is finally loaded into the IP, the processor begins executing code at the appropriate offset within the code 
object. Now any data reads or writes of the form 



11 



add rO, [rl, aO] , r2 



will be relative to the code block rather than to the absolute address contained in rl. 

2.10 A3- RELATIVE ADDRESSING 

We have seen that when the MDP dispatches a new message, A3 is set to the start of the message. We can 
then use this pointer to access the arguments. There are two slightly subtle issues to consider. We know 
that messages are stored in a circular buffer. What happens when the new message does not quite fit in 
the region left between the current queue pointer and the end of the buffer? Also, what happens when a 
message that has been suspended and saved in main memory is finally resumed? These two questions are 
somewhat related. 

Let us consider the first question; the case in which the message "wraps- around" the message queue. 
Words of the message are collected and stored normally until the end of the buffer is reached and then the 
remaining words are stored at the start of the buffer. However it is clear that the user does not want to deal 
with this issue and so the MDP provides the illusion of a sequential set of words. Accesses through A3 are 
compared against the base and length indicated by the register that points to the entire message buffer, and 
accesses past the end of the queue are translated to the appropriate offset at the start. This is achieved at 
the same speed as a standard memory access. 

Now we examine the second issue; that of resuming a suspended thread. Consider the previous example 
in which a message to invoke dot-product was processed. If the code block is not available on the current 
node, then the message must be copied from the message queue into another area of the MDP memory so 
that the arguments will be available when the code is delivered. The MDP issues a request for the code 
object and then suspends the current message which causes it to be removed from the message queue. Later, 
perhaps after the MDP has executed dosens of other messages, a reply arrives delivering the code object. 
The code object itself is copied from the message and then the original thread is resumed. A3 currently 
points to the message that provided the code but this code expects it to point to the original one. We must 
therefore adjust A3 so that it points to the current location of the message. However, accesses via A3 will 
now point beyond the end of the queue and so we must prevent the wrap-around feature from being used. 

The Q-flag provides exactly this control. When it is set, as it is during a message dispatch, accesses 
relative to A3 are checked against the bounds of the message queue. When it is cleared, the bounds of the 
queue are ignored. It is clear that care must be taken during the interval in which A3 is to be redirected 
and Q is cleared. 

2.11 FAULTS 

The MDP defines 19 fault conditions and orders them such that a single handler will be selected if more 
than one fault occurs at the same time. When a fault is detected, such as a type-fault or an xlate-fault, the 
current state of the computation must be saved and then the appropriate handler must be selected. 

There are four registers associated with a fault. The Faulted Instruction Pointer, FIP, stores the value 
in the IP at the time of the fault. This will be used to resume the instruction sequence when the fault has 
been handled. If the fault is associated with the execution of an instruction, then the current instruction 
is stored in the Faulted Instruction Register, the FIR, and the Faulted Operand registers FOP0 and FOP1 
will be updated. The fault handler can inspect these registers to determine how to proceed. 

There are two fault vectors, one for each of the message priorities, stored at known locations in the lower 
part of the MDP memory. Each vector contains 32 slots that are indexed by the fault number associated 
with the fault. These vectors must be set as a part of the general system initialisation. 

2.11.1 Asynchronous Faults 

There are two fault-types that occur asynchronously to the execution of the instruction stream; external 
interrupts, and queue overflow traps. The MDP contains a simple diagnostic port that can be used to read 
and write the memory of the node, to start and stop the processor core, and so on. One of the commands 
causes the interrupt fault to be generated at the end of the current instruction. 

A queue-overflow is signalled when a message arrives but the target message queue is already full. Han- 
dling of this fault may be delayed if a switch to the required priority is disabled. In this case, the network 
backs up until it can be serviced. The associated fault handler will move a portion of the queue to external 
memory and then restart the current message. 

Care must be taken when writing programs to account for such possible asynchronous faults. As a 
minimum such a fault will alter the FIP. It is possible to disable such faults during critical regions of code 



12 



by setting the Interrupt Bit, the I-bit, or the Fault Bit, the f-bit, appropriately. These flags are discussed 
more fully later. 



2.11.2 Calls 

The MDP also provide a simple call mechanism that is based closely on the fault mechanism. The call 
instruction takes a single operand, an integer, and uses it as an index into a table of 64 possible calls. The 
IP is saved in the FIP as for a fault. 

2.11.3 Unchecked Mode 

To increase the flexibility of the MDP and to increase the efficiency of certain sequences of low-level code, it 
is possible to disable most of the checking performed by the MDP. The major effect of executing in unchecked 
mode is that the types of operands will be ignored. This mode is controlled by the Unchecked Flag, or U-bit, 
contained in the current instruction pointer. 

2.12 NETWORK INTERFACE 

The Message Driven Processor contains a high-performance 3 dimensional router that functions indepen- 
dently of the processor core. Messages are transmitted through the current node or added to the node's 
message queues without interrupting the integer unit. The programmer is required to perform certain initial- 
isation tasks to configure the message queues, and must be prepared to handle four possible fault conditions 
as a result of processing messages. 

2.12.1 Message Queues 

Incoming messages are queued in message queues before being dispatched and processed. There are two 
message queues, one for each priority level. Each message queue is defined by two registers: the QBM, queue 
base/mask register, and the QHL, or queue head/length register. The queue base/mask register defines the 
absolute position and length of the queue in memory. In order to simplify the hardware, the length must be 
a power of 2, and the queue must start at an address that is a multiple of the length. The queue head/length 
register specifies which portion of the queue contains messages that have been queued but not processed 
yet (including the message not yet dequeued by SUSPEND). To avoid having to copy memory, the queue 
wraps around; if a twenty- word message has arrived and only eight words are left until the end of the queue, 
the first eight words of the message are stored until the end of the queue, and the next twelve are stored 
at the beginning. The queue head/length register contains the head and length of the queue instead of the 
head and tail to simplify the bounds-checking hardware involved in checking user program references to the 
queue. Below is a diagram of a queue with one message being processed, one more waiting, and a third one 
arriving. 




Figure 2.1: A Message Queue. 



13 



To reduce contention for the shared on-chip memory and to simplify certain aspects of the queue man- 
agement hardware, incoming messages are buffered within the network interface in the Queue Row Buffers 
and written to the memory in blocks of four words. This can cause one, two, or three words to be wasted 
between messages in the queue. This alignment is transparent to the software; the length and head in QHL 
are automatically aligned to multiples of four words by the hardware. The length field of the message header 
specifies the exact length of the message. The effect of this buffering is sometimes visible to users performing 
low-level debugging of the contents of the message queue, or when writing certain low-level system utilities. 

2.12.2 Message Reception 

There are two stages in processing of messages: queueing and execution. A message cannot be queued if the 
D-bit of the associated QBM register is set. This bit is set during reset to allow proper system initialisation 
and may be set by certain kernel system routines. If the D-bit is set, the message will back up into the 
network but no data will be lost. A queue-overflow trap is requested whenever data arrives and the queue is 
already full. It will be handled as soon as the priority is runnable. 

If the processor is currently executing at a lower priority than the new message and interrupts are enabled, 
then the message will be dispatched as soon as the first four words are delivered. The A3 register is written 
with the base field from the QHL and the length field from the bottom 10 bits of the message header. The 
Q bit in the status register is set to allow accesses to messages that are "wrapped around", such as the 
twenty-word message in the example above. 

The B flag is cleared, P is set to the priority at which the message arrived on the network. If the first 
word of the message is tagged MSG, then the IP offset and the F and U flags are loaded from the first word 
of the message, otherwise a message-fault is signalled. The AO- Absolute bit is set. 

2.12.3 Suspend 

The SUSPEND instruction terminates the processing of the message. First it flushes one message from the 
proper input queue. Then, if another message (of either priority) is ready, it is executed as described in 
the Message Reception section. Otherwise, the processor resumes execution of the background priority. A 
SUSPEND executed in background mode produces indeterminate results. 

Note that every SUSPEND corresponds to exactly one message arrival. This SUSPEND terminates the 
processing of the message and also flushes the message. Therefore, every MDP routine that gets executed 
by a message must terminate with a SUSPEND at some point. 

2.12.4 Message Transmission 

The SEND, SEND2, SENDE, and SEND2E instructions are used to send messages. The first word sent 
specifies the absolute node number of the destination node (i.e. the destination node's NNR value) in the 
low 16 bits. The high 16 bits are ignored. The type is also ignored so long as it is not CFUT in checked 
mode. The third argument of each SEND instruction determines the priority at which the message is to 
be sent over the network: 0 means priority level 0 and 1 means level 1. The priority of the message is 
independent of the priority of the process that is sending it. 

The initial routing word is followed by a number of words which the network delivers verbatim to the 
destination node. The network does not examine the contents of these words other than to verify that 
they are not of type CFUT if the processor is in checked mode. The message is terminated by a SENDE or 
SEND2E instruction, which sends the last one or two words, and tells the network interface that the message 
is complete. The first word that arrives at the destination node (the second word actually sent, since the 
routing word is only used by the network and is stripped off during routing) must be tagged MSG. 

There are several issues to be aware of when sending messages. The total time between the first SEND and 
the SENDE should be as short as possible to avoid blocking the network. The user should avoid performing 
significant computation or taking faults during a sequence of SEND operations. A more subtle error can 
occur in programs that use both priority 0 and priority 1. If a dispatch to priority 1 were to occur while 
a thread at priority 0 is in the middle of sending a message, and if that priority 1 thread sends a message, 
then the messages will be concatenated. The solution to this is to explicitly disable interrupts, by setting 
the Interrupt Flag, before the priority 0 task begins sending the message. 

2.12.5 Faults Associated with the Network 

There are four possible faults that can be taken as a result of the use of messages: send- fault, message-fault, 
early-fault, queue-overflow-fault. 



14 



2.12.5.1 Send Fault 



This fault occurs when a send instruction is unable to be executed because the network output buffers are 
full. This is normally the result of a temporary congestion problem in the vicinity of the node that can be 
expected to clear again relatively quickly. The classic fault handler pauses, backs up the instruction pointer, 
and retries the send. The network can be backed up for considerable periods of time if a receiver disables one 
of its message queues, such as when handling queue overflow faults. If a node disables its priority 1 queue, 
it will not only backup priority 1 messages but can also halt the reception of priority sero messages if an 
earlier priority 1 message has filled the input buffers. A sophisticated retry handler will include a timeout 
feature that detects when the network has been blocked for "too long". This fault occurs regardless of the 
state of the unchecked flag. 

2.12.5.2 Message Fault 

This fault occurs at the destination node if the header word of the message is not of type MSG. It can occur 
as the result of a programming error, as a deliberate effort to handle certain messages specially, or because 
of bit errors in the network media. 

One common programming error that can cause an unexpected message fault is to clear the I-bit to 
accept interrupts without initialising the QHL and QBM registers of both priorities. The MDP constantly 
inspects these registers whenever the I-bit is clear and will mistakenly conclude that a message is available 
if they are not set properly. This error usually occurs in simple test programs that did not expect to send 
or receive messages but wanted to accept external interrupts. 

2.12.5.3 Early Fault 

The network interface adds words to the message queue four words at a time, and the processor will dispatch 
as soon as the first four words have been delivered. If a longer message occurs and the code attempts to 
read an argument that has not been delivered or attempts to suspend before the entire message has arrived, 
an early fault occurs. This is handled in the same manner as for a send-£aul t; pause and then retry the 
instruction. This fault cannot be disabled. 

2.12.5.4 Queue Overflow Fault 

Queue overflow interrupts are signalled when the last empty word of the queue is written, but may cause 
an interrupt only when running at the same priority as the queue which overflowed. In other words, if 
the priority 0 queue overflows and a priority 1 process is currently running then the handler for the queue 
overflow must wait until all pending priority 1 processes have suspended and the processor has returned 
to priority 0 before the fault handler will be executed. Likewise, if the priority 0 queue overflows and a 
background mode process is currently running with interrupts disabled then the handler must wait until the 
background permits the processor to switch to priority 0. 

Note that a queue overflow occurs whenever the message queue contains 1 more word that the mask field 
of the QBM register for the associated priority. If the queue has the maximum length of $400 words, then 
the mask field of the QBM is $3FF, ten bits all set to 1, but the length field of the QHL at the time of the 
fault will have just overflowed from $3FC to $000. This is the correct operation of the QHL register. 

This fault may be handled by copying the contents of the queue into memory and then arranging to 
sequence the messages from there. Writing this code is not for the faint of heart but it is practical. This 
fault is disabled whenever the I-bit is set or when the F-bit is set. The queue-overflow fault handler must 
explicitly write to the associate QBM register before returning in order to clear the fault indication. 

2.12.6 Initializing the Network 

There are several steps that must be followed to initialise the network: the queue registers must be set, the 
NNR register must be set, and the internal datapath of the routers must be set. 

The first step is to allocate the message queues, dear the D-bit in the QBMs, and then accept interrupts 
by clearing the I-bit. 

Every node must be assigned a unique node number and this number must be written to the Node 
Number Register before any messages can be sent. The nodes of the J- Machine are physically connected in a 
3 dimensional space but the MDP cannot trivially determine where is is in this space. There are two standard 
approaches to informing each node of its node number: write the node-id into a known memory location on 
each processor from the console before starting the machine, or send a node initialisation message to each 
processor. The disadvantage to the first approach is that it requires the front-end host to cycle among all of 



15 



the nodes and write a different number into every processor. This is potentially quite slow on a 1024 node 
J- Machine and lacks finesse. 

An alternative solution, and one that is demonstrated in one of the programming examples, is to send a 
message to perform initialisation. This message contains the node-id of the processor as one argument and 
the maximum node-id of any node in the J- Machine as a second argument. The node sets its own NNR and 
then determines which of its forward neighbors are present. It computes the node-id for each of the possible 
dimensions and forwards the initialisation message. This message fans out in wave through the machine. 
This approach works because the MDP routers always accept the first message after reset. 

The MDP contains three routers, one for each dimension, that handle the actual transmission of data. 
Each router contains an internal address register that cannot be written directly from the integer core. 
Instead these registers are programmed by sending a message with the correct address in it from the node 
itself. Thus, when the node determines its node-id and has set the NNR, it must send a message to itself 
at each priority. This message can immediately suspend as it was sent only for this effect. This is also 
demonstrated in the programming example. 



The external memory interface supports the use of up to 1 Mword of off-chip DRAM. It provides single-bit 
error correction and double-bit and multiple nibble error detection, DRAM refresh, and support both static- 
column and page-mode memory devices. It is controlled by two registers: the refresh timer, and the error 
control. 

DRAM refresh is disabled after reset. It is enabled by writing a value to the 8 bit emurtc register. This 
register is memory mapped at location IFFFFO. The value stored into this location is loaded into an internal 
counter and incremented by 1 on every cycle. When this value is incremented to IFF a DRAM refresh is 
performed and the start count is reloaded into the counter. 

The error control register, emLerc at location IFFFFl, controls the error correction mode, the type of 
DRAM being used, and counts the number of single-bit errors corrected. 



The M-bit indicates the type of DRAM being supported. It is set for page-mode and clear for static-column. 
Setting the D-bit disables error correction. The M-bit is unspecified after reset and the D-bit is set. 

The memory interface includes a 12 bit data bus. It takes at least 4 cycles to read and write external 
memory, one for setup and three more to transfer an MDP word. If error-correction is enabled then an 
additional cycle is performed to read or write the error syndrome. No additional delay is incurred to perform 
the error check itself unless the data being read must be corrected. A dramerr trap is taken if the data read 
contains more than 1 bit error. 



The diagnostic interface is used to "boot" the J- Machine and to support low-level debugging operations. It 
can be used as the basis for all interactions between the machine and the front-end host but the intent is 
that it be used to install the initial code and that a network interface adapter be used after that. 

The MDP contains an internal 64-bit shift-register that is used to enter commands to read and write 
memory, to stop, start, and step the processor, and to cause an external interrupt. The format of the register 
is shown below: 

6 5 5 3 3 

3 65 65 0 

command | onset | Data Word 

This register is loaded through a bit-serial shift-port that is controlled by a special host-interlace board 
present in the front-end, and by registers on each processor board. When the J- Machine is first turned on 
and reset, the memory is in an unknown state and only a few of the registers have a defined state. The 
processor core is halted, i.e. it is not fetching instructions. Of particular concern is that the network control 
registers have not been initialised and therefore the MDP will not accept messages. The diagnostic port is 
used to install, at least, the initialisation code for the programming system and to start the processor at 
$400 in aO-absolute mode. This code must initialise the major state and enable interrupts. It is also possible 
to generate an asynchronous interrupt fault through this interface. 



2.13 EXTERNAL MEMORY INTERFACE 



9 8 7 



0 




2.14 DIAGNOSTIC INTERFACE 



16 



Chapter 3 

REGISTERS AND MEMORY 



The MDP register file is divided into several different classes based on the associated functionality; there are 
a small number of general purpose registers that can hold values of any type, a set of address registers, ID 
registers used for name translation, and a number of special purpose registers used to support fast trapping 
and, the network interface, and so on. The primary programmer registers are replicated for each priority of 
operation to support fast context switching among the background and the two message priorities. 

The figures included below indicate which priorities are supported on the right. For example, there is a 
unique set of general purpose registers for every priority, the ID registers are replicated for priority 0 and 
priority 1 but not the background, and the NNR is a "global" register and is not replicated for any of the 
priorities. 



3,1 DATA REGISTERS 

There are four 36 bit data registers for each of the three priorities and each register can store values of any 
type. No special meaning is assigned to any of the fields of these registers. The core arithmetic and logical 
operations require one operand to be in a data register and for the destination to be a data register. Long 
constants, described in more detail later, are always stored in RO making this register slightly less valuable 
for storing intermediate values than the other registers. Instruction formats and restrictions are discussed 
in more detail in a later section. 



3 3 
2 1 



tag 



RO - R3 
data 



Pris 
All 



3.2 ADDRESS REGISTERS 

The address registers A0-A3 are used for all memory references. As for the data registers, there are three 
independent sets of address registers. These registers are always read as ADDR-typed values regardless of 
the type of value written to them. Two of the registers, AO and A3, have special significance during program 
execution (see aO-absolute/relative mode and a3-relative mode). 



3 3 


3 3 2 




1 






5 2 


10 9 




0 9 




0 






AO- A3 








0 0 11 


r|i| 


base 


1 


length 





Pris 
All 



Setting the invalid bit causes all memory references using the address register to signal an invalid address, 
INVADR, fault. This bit may be set by certain runtime system primitives to assist in memory compaction, 
object relocation, and so on. 

Setting the relocatable bit indicates that the address refers to an object that may be moved. This bit is 
purely for the convenience of the runtime system and has no effect on the processing performed by the MDP. 
This bit allows a post-heap-compaction invalidation of only the relocatable addresses, leaving the locked 
down physical addresses intact. 



17 



Note carefully that the MDP is a word-addressed machine; the offset stored in an address register is to 
the start of a full 36-bit word within the memory space of the processor. This representation permits the 
addressing of 1 Mword, effectively 4 Mbytes, of tagged data. 



3.3 INSTRUCTION POINTER 

The instruction pointer indicates the next instruction to be fetched and contains a number of flags that have 
a significant impact upon the execution of the program. 



3 3 3 2 
2 10 9 



0 9 8 7 









IP 






0 10 0 


.jfl 


offset 




|p|a|0 


0 



Pris 
All 



The offset stored in the IP is either an absolute address within the memory space of the MDP or an offset 
relative to the base of the current method pointed to by AO (see AO- Absolute Addressing). The current 
mode is indicated by bit 8, the AO-Absolute bit. When this bit is set, all memory references via AO are 
absolute addresses and the actual value in AO is ignored. When it is clear, they axe relative to the value in 
AO. 

Two instructions may be stored in a single word. The phase bit, bit 9, indicates which instruction is 
being fetched. If P is 0, the first instruction, stored in the upper half of the word, is fetched otherwise the 
instruction stored in the low half is executed. 

The U-bit, bit 31, is a copy of the unchecked bit stored in the status register. Changing it by altering 
the IP register changes it in the status registers and vice-versa. 

The F-bit, bit 30, is similarly a copy of the fault bit of the status register. Its use is discussed later. 



3-4 FAULT REGISTERS 

A key design feature of the MDP is hardware support for trap handling in exceptional circumstances. Traps 
are taken when data type tags differ from the expected value, to support synchronisation mechanisms via 
the CFUT type, to assist in the management of the message queue, and so on. A complete description of 
the fault types is included later. The call mechanism is based closely on the fault mechanism. 

When an exceptional situation is detected, up to four registers are updated with the key processor state 
to permit the software trap handler to determine the reason for the fault. 

The Faulted Instruction Pointer, FIP, is loaded with the current IP when the fault is taken. Since the IP 
is pre-incremented, the FIP contains the IP of the instruction immediately following the faulted instruction. 
If the instruction must be reexecuted after taking some appropriate action, the fault handler is responsible 
for backing up the IP in software. 



3 


3 


3 


3 


2 






1 










5 


2 


1 


0 


9 






0 


9 8 


7 


0 


Pris 


0 1 


0 0 


u 


K 


1 


offset 


FIP 




|p|. 


1° 


0 


All 



The faulted instruction registers, FOP0 and FOP1, are loaded with the values of the OpO and Opl 
operands whenever an instruction-specific fault occurs. Note that it is necessary to have a detailed under- 
standing of the instruction format to determine which value will be stored in each register. If a fault occurs 
that was not caused by a specific instruction, then the value written into these registers is indeterminate. If 
a faulting instruction has no OpO or Opl operand, then the value of FOP0 and/or FOP1 is indeterminate. 



3 3 
2 1 



FOP0 - FOP1 
data 



Pris 
P0 PI 



tag 



18 



The Faulted Instruction Register, FIR, contains the instruction that caused the exception or NIL if the 
fault is not associated with any instruction. The instruction is always stored in the low 17 bits regardless of 
which half of the word it appeared in during the program execution. 

3 3 3 1 1 

5 4 3 7 6 0 Pris 



1 1 




flR 




0 0 


0 


o| 


instruction 



3.5 NETWORK CONTROL REGISTERS 

The node number of a node and the extent of the two message queues must be set by the programmer during 
program initialisation. 



3 3 


3 


11 1 


5 2 


1 


6 5 0 9 5 4 0 






NNR 


0 0 0 1 


0 


0 1 sdim | ydim { xdim 



The node number register, NNR, contains the network node number of this node. It consists of an X 
field, a Y field and a Z field indicating the position of the node in the 3D network grid. Its value identifies 
the processor in the network and is used for routing. The NNR should be initialised by software after a reset 
and left in that state. The NNR is read and written as an INT-tagged value. 

The queue base/mask register, QBM, indicates the base and extent of the input message queue. The 
base is the first memory location used by the queue. The mask must be of the form 2* — 1, with n >= 2. 
The sise allocated to the queue is equal to the mask plus 1. There is one more restriction: (base XOR mask) 
— 0 must hold. This effectively means that the base must be a multiple of the sise of the queue, and this 
sise must be a power of 2. These conditions allow queue access and wraparound to work by simply ANDing 
the offset within the queue with the mask and then ORing with the base. The disable bit, bit 30, should 
normally be sero. Setting it disables message reception at the priority level of the QBM register, which may 
cause messages to be backed up in the network. This should be done only under special circumstances, such 
as when the queues are being moved. It is set after reset. The QBM register is read and written as an 
ADDR-tagged value. 

The queue head/length register, QHL, contains two fields, head and length that describe the current 
dynamic state of the queue. Head is an absolute pointer (i.e. relative to the beginning of memory, not 
the beginning of the queue) to the first word that contains valid data in the queue, while length contains 
the number of valid data words in the queue. The length is sero when the queue is empty, and 1 greater 
than the mask when the queue is full. The length can also be sero if the message queue is the maximum 
possible length, 1024 words, and the queue is full. The queue is clearly full if the length is sero and yet the 
queue-overflow fault has been signalled. QHL is read and written as an ADDR-tagged value. 



3 3 3 3 2 1 

52109 09 0 Pris 



0 0 11 


x|d| 


base 


QBM 


| mask 


P0 PI 


0 0 11 


x|x| 


head 


QHL 


) length 


PO PI 



3.6 NAME CACHE CONTROL 



The translation base/mask register, TBM, is used to specify the location of the two-way set-associative 
lookup table used by the XLATE and ENTER instructions. The format of the TBM register is similar to 
that of the QBM register. Again, base is the first memory location used by the table. The mask must be of 
the form 2 n — 1, with n >= 2. The number of words occupied by the table is equal to the mask plus 1. As 
in QBM, (base XOR mask) = 0 must hold. TBM is read and written as an ADDR-tagged value. 



19 



3 3 3 3 2 



5 2 10 9 






0 9 




0 


Pris 


[o 0 1 1 |i |x | 


base 


TBM 


1 


mask 




Global 



3.7 ID REGISTERS 



The ID registers provide support for the maintenance of a global namespace within the J-Machine for certain 
programming systems. In snch systems objects will be named using an ID, or handle, and these IDs will 
be the sole reference communicated between methods distributed across the machine. This design permits 
objects to be relocated within the private memory of a given node or to be transferred to another node 
without the requirement that a very large number of physical addresses be invalidated within the machine. 
Operating system primitives provide the primary mechanisms for translating these handles into a physical 
address at the site of the object. 

These name translations, from a name to an AD DR- tagged value to be stored in an address register, 
can be accelerated using the enter and translate operating provided by the name cache system. In normal 
practice, ID register N will hold the ID of the relocatable object pointed to by address register N. The name 
translate operator automatically stores the ID in the appropriate register during an XLATE. 



3 3 
2 1 



IDO - ID3 
data 



Pris 
PO PI 



3.8 MEMORY ADDRESS REGISTER 

The Memory Address Register, MAR, is provided for debugging purposes and is probably of little use to 
the applications programmer. This register contains the absolute address of the last memory location read 
or written by the execution of an instruction. The MAR can only be read. 



3 3 


3 


2 1 






5 2 


1 


0 9 




0 








MAR 




0 0 0 1 


0 


o| 


memory address 





Global 



3.9 PROCESSOR STATUS FLAGS 

The status register is a collection of flags that may be accessed individually using READR, WRITER, or 
the alias MOVE. The status register cannot be accessed as a unit. It contains these flags: 



3 3 
2 1 



0 0 10 



Status Flag S € { I, B, P, U, F, Q, M } 



1 0 



0 Is 



I: 


Interrupt Mask 


0: Interrupts Allowed 


1 


Interrupts Disabled 


B: 


Background Execution 


0: Message 


1 


: Background 


P: 


Priority Level 


0: Level 0 


1 


: Level 1 


U: 


Unchecked Mode 


0: Checked 


1 


Unchecked 


F: 


Faults Disabled 


0: Normal 


1 


Faults Disallowed 


Q: 


Queue Wrap Around 


0: A3 Points Into Memory 


1 


A3 Points Into Queue 


M: 


Message Flag 


0: Message Send Complete 


1 


Message Being Sent 



Note carefully that the F and U flags are mirrored in the current instruction pointer. They are set or 
cleared explicitly by the user or as a result of loading the instruction pointer from a fault vector, from the 
call vector, from the header of a message, or by the LDD? and LDIPR instructions. 



20 



The priority and background flags specify the current priority level of execution. The highest level is 
priority 1, with the settings P=l, B=0. Below that is priority 0, with P=0, B=0. The lowest priority level 
is background, with B=l. 

The interrupt mask flag determines whether the current process may be interrupted. Setting this flag 
disables all interrupts. Clearing this flag allows interrupts. There are two types of interrupts: synchronous 
faults associated with the current instruction stream, and asynchronous faults which are not. The syn- 
chronous faults are disabled whenever the I-bit is set or when the MDP is sending a message to priority 1 
(see M-bit below). The two asynchronous faults, message queue overflow and external interrupt, are also 
disabled when the F-bit (below) is set. 

The fault flag is used to identify periods during which it is not acceptable to take a fault. It is contained 
in the status register and is mirrored in the current instruction pointer. It is normally set while entering a 
fault handler, as part of the IP stored in the fault vector. It can be cleared explicitly once the fault registers 
are saved in memory or when the original instruction pointer is reloaded from the FIP register at the end of 
the fault handler. If a fault occurs when this bit is set, the fault is ignored and a CATASTROPHE fault is 
taken instead. The handler for this fault will generally save the state of the processor and then signal the user 
to indicate that a programming error has occurred. The fault flag is also used to disable the asynchronous 
queue overflow and external interrupt faults. 

The unchecked mode flag determines whether TYPE, CFUT, FUT, TAGS, TAG9, TAGA, TAGB, and 
OVERFLOW faults are taken; when this flag is set, these faults are ignored, which allows more freedom in 
manipulation of data but provides less type checking. There is also a copy of this flag in the IP register. 
Changing this flag changes it in the IP register and vice versa. As with the F flag, there are three copies of 
the U flag, one for each priority level and one for background mode. 

The A3-Queue bit, when set, causes A3 to "wrap around" the appropriate priority queue. This is included 
to allow A3 to act transparently as a pointer to a message, whether it is still in the queue, or copied into the 
heap. If the message is still in the queue, then setting the Q bit allows references through A3 to read the 
message sequentially, even if it wraps around the queue. If the message is copied into an object, then leaving 
the Q bit clear allows normal access of the message in the object. The Q bit is set on message dispatch, and 
it is left to the software to clear the Q bit when a message is copied into the heap. Either way, the access of 
the message pointed to by A3 looks like any other reference through an address register and bounds checking 
is performed. Note that when the Q bit is set, the head of QHL should point to the same place as the base 
of A3 (since the start of the queue is also the start of the next message to be processed). There is a Q bit 
for each priority level, but no Q bit for background mode (because there is no queue for background mode). 

The two M-bits indicate whether a message is being sent TO a particular priority. MPO is set if the 
message is being sent to priority sero and MPl is set if a message is being sent to priority one regardless of 
the current priority. These bits are set by the use of the SEND instruction and cleared by SENDE. MPO is 
set for the support of certain system oriented routines but has no effect on the execution of the processor. 
The MPl bit disables interrupts i.e. it is not possible for the MDP to process an asynchronous fault or 
dispatch a new message while it is sending a PI message. 



3.9.1 Background Priority 

It has been noted that the background priority is selected when B=l. When B— 1 and one of the standard 
background registers is read or written, the appropriate background register will be accessed regardless of 
the P bit. However, it is possible to access a non-background register, e.g. QHL, directly with B=l. In this 
case, the register that is selected is determined by the current value of P. If P=0, the QHLPO will be accessed 
otherwise QHLP1 is accessed. The P flag is set to 0 during chip reset and we use this as the definition of 
background i.e. B=l and P=0. It is generally unwise to run with B=l and P=l although there may be 
reasons to do so in special situations. 



3.10 MEMORY MAP 

The MDP contains 4K words of on-chip SRAM. The prototype J-Machine adds a further 252K words of 
error-corrected off-chip DRAM for a total of 256K words. Certain memory locations have special purposes 
assigned to them by the hardware. These are outlined in the table below. 



21 



FROM 


TO 






*UUU«>f 


Priority switchable memory 0 


•UUU4U 


•uuuir 


Priority switchable memory 1 


•nnnfln 


♦uuuyj? 


Priority 0 fault Vectors 


C nnn a ft 


•UUUor 


Priority 1 fault Vector* 




•ft n i ft ft 


uaii table 


inn inn 


fuut r r 


u ucommikieci on-cmp ivam 




1ft 4 ft TP 


External memory on J- Machine prototype 


$04100 


IFFFEF 


Additional external Memory Address space 


IFFFFO 


IFFFF1 


EMI control registers 



Within the uncommitted internal RAM, the operating system will allocate several hundred words to the 
message queues and the XLATE cache and leaves the rest of RAM for user programs. The call vector table 
length is operating system definable, but its base must be location $000C0. 

3.10.1 Priority Switchable Memory 

In order to allow each priority level to have 32 private temporaries, the fust 64 words of memory are decoded 
specially. When accessing one of these 64 words, the current state of the P flag is XORed with bit 5 of the 
address; hence, referencing location 1 accesses physical location 1 when running in priority level 0 (P flag 
clear) or location 33 when running in priority level 1 (P flag set). This scheme lets the operating system and 
user programs use memory locations 0 through 31 as temporaries private to the current priority level. The 
other priority level's temporary "globals" can be accessed as locations 32 through 63. 

3.11 EXCEPTIONS 

3.11.1 Reset 

When the processor is reset, the status register flags are set as follows: Q*=0, U*=1,F*=0, M*=0, 1=1, 
B=l, P=0. The A bit in the background IP and D bits in both QBM registers are set. The background IP 
offset is set to $400. The program that gets executed after a reset should set up the queues, NNR, and at 
least some of the fault vectors and then clear the I flag and the D bits in the QBM registers to allow message 
reception. 

3.11.2 Fault Processing 

When a fault occurs, the instruction that caused the fault is saved in the FIR register, the current IP (which 
points one instruction beyond the faulting instruction) is saved in the FIP register, and the values of the 
OpO and Opl operands (if any) are saved in the FOP0 and FOP1 registers; the IP is then fetched from the 
memory location whose address is equal to the fault number plus the base of the fault vector table of the 
current priority (when in Background mode the Priority flag is used to select the fault vector). If the F bit is 
set and a fault occurs then the IP is loaded from the CATASTROPHE fault vector. The U, A, and F bits of 
the IP that gets loaded may change the processor state. U determines if this priority is in unchecked mode, 
A determines if AO absolute mode is in effect, and the F bit determines whether the fault is non-reentrant 
and interruptible. 

3.11.3 System Calls 

A system call (via the CALL instruction) mimics some of the behavior of a fault to provide convenient 
access to system routines. When a CALL occurs, the base of the system CALL vector table is added to 
the CALL operand, and the contents of this location are fetched, yielding a call handler IP. The current IP 
(which points to the next instruction) is saved in the current priority's FIP register. Execution then begins 
by loading the call handler IP (which sets the F, A, and U bits in the status register to the values in the call 
handler IP). 

3.11.4 Interrupts 

There are three types of interrupts supported on the MDP: priority switches, queue overflow interrupts, and 
external interrupts. Priority switches may occur at any time, provided that the I and MP1 flags are both 
clear. Queue overflow and external interrupts may only occur when all of I, MP1, and F flags are clear. 
Priority switches should be the most common interrupts; these occur when a message arrives in the queue of 



22 



a priority higher than the current priority. Thus, priority 1 messages can interrupt priority 0 or background 
mode, and priority 0 messages can interrupt background mode. The handler for a priority switch is the 
interrupting message itself. 

Queue overflow faults are discussed in the section on the network interface. 

External interrupts are similar to queue overflow interrupts except that whenever the I, MP1, and F 
flags are clear and an external interrupt is signalled, a fault is signalled at the current priority and the IP is 
loaded from the INTERRUPT fault vector. The interrupt is handled as a process of the same priority as the 
priority which it interrupted. An external interrupt is signalled by an external interrupt pin on the MDP 
package. 

Interrupts may occur only between instructions. After an interrupt the FIP points to the next instruction 
of the interrupted sequence. 



3.11.5 Fault Types 



Natti ^ 


A! UUIUCI 


Tivta^rtniiftn 


CATASTROPHE 


to 


lift 11 hi* 4 fit 11 It Hjt/1 VM*tAt ft* /\ih»f <*Jttlka£Tnrkll #> 


INTERRUPT 


41 

f 1 




OT7FTTE 






SEND 


S3 


<t*n*l KnflVv full 


TT r2TWQT 


tA 

9* 


uiegai uutrucuoii. 


rvt> a vriP'DD 
JJrtAMriirttt 




uouDie Die error m me external aaju. 


INVADR 


$6 


Attempt to access data through address register with I bit set. 


ADRTYPE 


$7 


The address index is not an integer. 


LIMIT 


$8 


Attempt to access object data past limit. 


EARLY 


$9 


Attempt to access data in message queue before it arrived. 


MSG 


$A 


Bad message header. 


XLATE 


IB 


XLATE missed. 


OVERFLOW 


$C 


Integer arithmetic overflow. 


CFUT 


ID 


Attempted operation on a word tagged CFUT. 


FUT 


IE 


Attempted operation on a word tagged FUT. 


TAG8 


IF 


Attempted operation on a word tagged TAG8. 


TAG9 


110 


Attempted operation on a word tagged TAG9. 


TAGA 


111 


Attempted operation on a word tagged TAGA. 


TAGB 


112 


Attempted operation on a word tagged TAGB. 


TYPE 


113 


Operand(s) with a bad tag type used in an instruction. 




I14-I1F 


Reserved for future faults. 



Note: If multiple faults occur simultaneously the fault vector chosen is the one that has the highest 
precedence. Each fault is assigned a precedence by its fault number; lower fault numbers correspond to 
higher precedence. 



23 



Chapter 4 

INSTRUCTION SET 



The instruction set of the MDP has been designed to be convenient, flexible, and to permit a reasonably 
tight encoding to make efficient use of memory. It is not a true RISC-like load/store instruction set, but it is 
certainly streamlined and should be readily familiar to users of current microprocessors. The processor core 
supports three-operand addressing, two for the source and one for the destination, and the basic instructions 
operate primarily on the general registers. In contrast to common convention in the current generation of 
so-called RISC chips, one of the operands can reference a slot in memory. There is one addressing mode; 
base with offset. The offset can be a small immediate or the contents of a general register. There is 
also support for accessing a small set of common program constants in a more efficient manner than the 
general mechanism for long constants. In the description that follows, the assembler syntax for the standard 
assembler, MDPSim, is included for clarity. 

4.1 INSTRUCTION ENCODING AND ADDRESS MODES 

The program executed by the MDP consists of instructions and constants. A constant is any word not tagged 
INSTO through INST3 that is encountered in the instruction stream. When a constant word is encountered, 
that word is loaded into RO and execution proceeds with the next word. 

Every instruction is 17 bits long. Two 17-bit instructions are packed into a word. Since a word has only 
32 data bits, two tag bits are also used to specify the instructions. The instruction in the high part of the 
word is executed first, followed by the instruction in the low part of the word. As a matter of convention, if 
only one instruction is present in a word, it should be placed in the high part, and the low part of the word 
set to all seros. 

The format of an instruction is as follows: 

1 1 1 

6 1 0 9 8 7 6 0 

f opcode |op2|opl| opO j 

The opcode field specifies one of 64 possible instructions. The other fields specify three operands; in- 
structions that don't require three operands may ignore some of the operand fields. Operands 1 and 2 must 
be data registers; their numbers (0 through 3) are encoded in the 1st reg # and 2nd reg # fields. Operand 2, 
if used, is always the destination of an operation and operand 1, if used, is always a source. OpO is encoded 
in one of two ways depending on the operator. It can be used as a source or the destination and supports a 
variety of accesses. 

4.1.1 Normal Addressing Mode 

This is the addressing mode used for OpO by almost all of the MDP's instructions. It permits access to the 
data and address registers, the use of short constants and of certain long constants, and for address-register 
indirect accesses. The operand can be a source or a destination, but it is nearly always the source. Typical 
uses of this operand are: 



25 



add 


rO, 


rl, r2 


rl it OpO 


add 


rO, 


3, rO 


a small immediate in the range -16 to IS 


add 


r2, 


[6, a2], r2 


Off lets in the range 0 to 16 


and 


rO, 


$3FF, rO 


A special immediate constant 


add 


rO, 


al, r3 


Uncommon. Only permitted in unchecked mode 


send2 


[3, a2], rl, 0 


OpO is sent first, therefore unusual syntax 



move 27, rO ; Immediate* in the range -32 to 31 

move [49, al] , r2 ; Offsets in the range 0 to 63 

Primitives with two source operands can represent the immediate integers -16 to 15, and immediate 
offsets in the range 0 to 31. If there is only one source, then the second operand field is used to extend 
the range by 2 bits, i.e. -32 to 31 for values, and offsets in the range 0 to 63. If the instruction has just 
one source operand then the extension is always taken from Op2 otherwise it is instruction dependent. The 
complete specification of this addressing mode is indicated below: 



Normal Addressing Mode 


Syntax 


0 


0 


0 


0 


0 


Kn 


Rn 


0 


0 


0 


0 


1 


An 


An 


0 


0 


0 




0 


0 0 


NIL 


0 


0 


0 




0 


0 1 


FALSE 


0 


0 


0 




0 


1 0 


TRUE 


0 


0 


0 




0 


0 0 


180000000 


0 


0 


0 




1 


0 0 


IFF 


0 


0 


0 




1 


0 1 


I3FF 


0 


0 


0 




1 


1 0 


IFFFF 


0 


0 


0 




1 


1 1 


SFFFFF 


0 


0 


1 


1 


Ix 


An 


[Rx, An] 


0 


1 


1 




unm 


unm 


1 




imm 




An 


(imm, An] 



Addressing Mode 

Data Register Rn 

Address Register An 

Immediate Constant NIL (SYM:0) 

Immediate Constant FALSE (BOOL:0) 

Immediate Constant TRUE (BOOL:l) 

Immediate Constant INT:I80000000 

Immediate Constant INT:$00OOOOFF 

Immediate Constant INT:$000003FF 

Immediate Constant INT:$0000FFFF 

Immediate Constant INT:$000FFFFF 

Offset Rx in object An 

Immediate (signed) 

Offset imm (unsigned) in object An 



26 



4.1.2 Register Oriented Addressing Mode 



The register-oriented opO mode is used instead of normal opO mode by the READR, WRITER, and LDIPR 
instructions. This mode provides access to registers at different priorities and to the special registers. The 
register-oriented opO mode encodings are as follows: 



6 0 



Regis* 


er 


Addressing Mode 


Syntax 


Addressing Mode 


B 


P 


0 


0 


0 


Rn 


Rn 


Data Register Rn 


B 


P 


0 


0 


1 


An 


An 


Address Register An 


- 


P 


0 


1 


0 


IDn 


IDn 


ID Register IDn 


B 


P 


0 


1 


1 


0 


0 


PIP 


Faulted Instruction Pointer 


- 


P 


0 


1 


1 


0 


1 


FIR 


Faulted Instruction Register 


- 


P 


0 


1 


1 


1 


0 


FOPO 


Faulted Operand Zero Register 


- 


P 


0 


1 


1 


1 


1 


FOP1 


Faulted Operand One Register 


- 


P 


1 


0 


0 


0 


0 


QBM 


Queue Base/Mask Register 


- 


P 


1 


0 


0 


0 


1 


QHL 


Queue Head/Length Register 


B 


P 


1 


0 


0 


1 


0 


IP 


Instruction Pointer 


- 


- 


1 


0 


0 


1 


1 


TBM 


Translation Base/Mask Register 


- 


- 


1 


0 


1 


0 


0 


NNR 


Node Number Register 


- 


- 


1 


0 


1 


0 


1 


MAR 


Memory Address Register 






1 


0 


1 


1 


0 




Unused (ILGINST Fault) 








0 


1 


1 


1 




Unused (ILGINST Fault) 








1 


0 


0 


0 


P 


Priority Level Flag 








1 


0 


0 


1 


B 


Background Execution Flag 








1 


0 


1 


0 


I 


Interrupt Flag 


B 


P 




1 


0 


1 


1 


F 


Fault Flag 


B 


P 




1 


1 


0 


0 


U 


Unchecked Flag 




P 




1 


1 


0 


1 


Q 


A3-Queue Flag 








1 


1 


1 


0 


M 


Send Flag 








1 


1 


1 


1 




Unused (ILGINST Fault) 



B represents the use of the Background register set or one of the two priority register sets. The B 
bit is XORed with the Background Flag and a register set chosen according to the result; 1 indicates the 
background registers, while 0 indicates the register set chosen by the P bit relative to the present priority. 
The assembler syntax for specifying a register belonging to the background is the register name followed by 
a "B". 

P represents the priority of the register being accessed, and is relative to the current priority. 0 indicates 
the current priority, while 1 indicates the other priority. The assembler syntax for specifying a register 
belonging to the other priority is the register name followed by a backquote ('). 

4.1.3 Instruction Row Buffer 

The message driven processor includes a 4 word Instruction Row Buffer, IRB, to decrease contention for the 
on-chip SRAM. When instructions are being fetched from on-chip memory, the IRB is loaded with up to 8 
instructions in a single cycle and then the instruction decoding occurs from the IRB. The words fetched to 
fill the IRB are aligned on a four word boundary even if the code branches into the middle of such a block. 
Branch instructions always refill the IRB even if the target instruction is already in the IRB. The IRB is not 
used for instructions that are located in off-chip memory. 

As with most processors, programmers must be wary of mutating the instruction stream that is about to 
be executed. Attempting to modify an instruction that has already been fetched into the IRB will not affect 
the current execution of that code but will successfully modify the in- memory version of the instruction and 
all future executions of the instruction. 



4.2 MOVE AND TYPE OPERATIONS 

There are two classes of move operations; the read/ write pair that uses the standard addressing mode, and 
the readr/ writer pair that use an extended mode that permits access to the special registers and to the 



27 



registers at other priorities (see normal addressing mode and register-oriented addressing mode). 



Source Types 

READ Src, Rd Rd «- Src. All but CFut 

READR Src, Rd Rd <- Src. All but CFut 

In checked mode, these operations fault if the operand is a CFUT. 

WRITE Rs, Dst Dst <- Ra. All 

WRITER Rs, Dst Dst «- Rs. All 

The WRITE instruction may be used to write a value of any type into memory. The WRITER instruction 
is used to write values to registers. In checked mode, type checking is done for values being moved into an 
address register or the IP. All types, including CFUT, may be moved into the other registers. 

LDIP Src IP <-Src. IP 

LDIPR Src IP «-Src. IP 

These instructions load the IP using normal addressing mode and register addressing mode respectively. In 
checked mode, the value must be of type IP. 

CHECK Rs, Src, Rd Rd «- BOOLrtag(Rs) == Src All.Int 

RTAG Src, Rd Rd - INT:tag(Src) All but CFut 

WTAG Rs, Src, Rd Rd — Src.Rs All,Int 

These operator permit the type of a value to inspected or modified. Rs can be of any type for Check and 
Wtag. Src must be an integer in checked mode. Src is treated as in integer in the range 0-15 inclusive in 
unchecked mode. Rtag faults CFUT in checked mode. 



4.3 ARITHMETIC OPERATIONS 

The MDP implements a full set of arithmetic operations on integers. Integers are represented as signed 
32-bit fbtnums in 2's complement format. The format of a typical arithmetic instruction is 

<op> Rs, Src, Rd 

This means that Rs and Rd must be one of the four general registers R0-R3. The Src operand is selected 
from the set of "normal" addressing modes described above. Note that using an address register as a source 
would only be appropriate in unchecked mode. Similarly, 3 of the special constants are not of type INT and 
would only be appropriate in unchecked mode. 



ADD Rs, Src, Rd Rd «- Rs + Src Int,Int 

CARRY Rs, Src, Rd Rd — Carry(Rs + Src) Intjnt 

SUB Rs, Src, Rd Rd Rs - Src Int,Int 

NEG Rs, Src Rd «- -Sic Int 



Carry returns 1 if adding the two numbers would generate an unsigned carry and 0 otherwise. It should 
not be used in checked mode, as it causes an overflow under the same conditions that add overflows. Add 
and sub produce results modulo 2* J in unchecked mode. In unchecked mode the type of Rd is the same as 
the type of Rs. An overflow occurs in checked mode when the signed result is not the sum /difference of the 
signed operands. NEG can overflow if Src = $80000000. 

MUL Rs, Src, Rd Rd «- Low 32 bits of Rs*Src Int,Int 

MULH Rs, Src, Rd Rd «- High 32 bits of Rs*Src Int,Int 

These operations generate a 64 bit product of their 32 bit signed inputs. Mulh returns the high 32 bits of 
this product while Mul returns the low 32 bits. The tag of the result is always the tag of the first operand. In 
checked mode these operations fault if the type of both source operands is not integer. The mul instruction 



28 



faults overflow if the result is not the product of the inputs. The mulh instruction never faults overflow. 

ASH Rs, Src, Rd Rd «- Rj « Src (arithmetic) Int,Int 

LSH Rs, Src, Rd Rd «- Rs « Src (logical) Int,Int 

Src may be negative and may be very large. It is not treated modulo 32; instead, Rs is shifted by Src bits 
to the left or right if Src is negative, whatever Src happens to be. For example, if Src = -50, Rd is set to 0 
by LSH and by ASH when Rj >= 0 and to -1 by ASH when Rs < 0. ASH treats Rs as a signed quantity, 
while LSH treats it as unsigned. An overflow occurs when Src > 0 and significant bits are shifted from the 
number; bits shifted to the right from the number are ignored. In unchecked mode the type of Rd is the 
same as the type of Rs, and Src is treated as if it were a signed integer. 

ROT Rs, Src, Rd Rd «- Rs rotate left Src Int,Int 

This is a rotate instead of a shift, so bits shifted out of the left side of Rs are shifted back at the right side. 
Src is an integer treated modulo 32 (since a rotate of 32 bits is the identity transformation). In unchecked 
mode the type of Rd is the same as the type of Rs. 

FFB Src, Rd Rd <- Find First Bit Int 



Rd is loaded with an integer value between 0 and 31, inclusive. This indicates how many bits must be 
traversed, going from left to right starting from bit 30, in order to find the first bit not equal to the sign bit 
(bit 31). (for example, FFB($80000000)=0, FFB($E0000000)=2, and FFB(I20000000)=1) This is useful for 
normalising floating point values. 



4.4 LOGICAL OPERATIONS 

The logical operations operate are similar to the arithmetic ones expect they accept either integer or boolean 
operands in checked mode. A type-fault occurs if the arguments are not of the same type in checked mode. 

AND Rs, Src, Rd Rd — Rs AND Src Int, Int Bool, Bool 

NOT Src, Rd Rd «- NOT Src Int Bool 

OR Rs, Src, Rd Rd «- Rs OR Src Int,Int Bool, Bool 

XOR Rs, Src, Rd Rd <- Rs XOR Src Int,Int Bool,Bool 

4.5 COMPARISON OPERATIONS 

These operations produce a boolean result based on the comparison of the two source arguments. The 
addressing modes are the same as for the arithmetic and logical operators. 

GE Rs, Src, Rd Rd «- Bool-.Rs > Src Int,Int Bool, Bool 

GT Rs, Src, Rd Rd «- BoolrRj > Src Int, Int Bool, Bool 

LE Rs, Src, Rd Rd *- Bool:Rs < Src Int, Int Bool, Bool 

LT Rs, Src, Rd Rd «- Bool:Rs < Src Int, Int Bool, Bool 

A TYPE fault occurs in checked mode if Rs and Src are not either both integers or both boolean*. False is 
regarded as being less than True. 

EQUAL Rs, Src, Rd Rd Bool:Rs = Src (Data) Int, Int Bool, Bool Sym,Sym 

NEQUAL Rs, Src, Rd Rd <- Bool:Rs ^ Src (Data) Int, Int Bool, Bool Sym.Sym 

A TYPE fault occurs in checked mode if Rs and Src are not both integers, both booleans, or both symbols. 
In unchecked mode the tags are ignored and the data portions are compared. 

EQ Rs, Src, Rd Rd «- BoolrRs = Src (Pointer) All but CFut or Fut 

NEQ Rs, Src, Rd Rd — Bool:Rs ^ Src (Pointer) All but CFut or Fut 



Both the data and the tag have to match for two pointers to be EQ. This is different from EQUAL in 

29 



unchecked mode. These primitives only fault in checked mode if one of the arguments is either CFUT 



4.6 BRANCH OPERATIONS 

The branch instructions permit conditional and unconditional branches. Because the IP will have already 
been incremented, an offset of tero will branch to the next instruction. The phase bit of the IP is always 
cleared. If Src is positive, the branch is forwards otherwise it is backwards. The Src may be a 7 bit immediate 
displacement, providing a range of -64 to 63 words, or the contents of a general data register. 



BR 


Src 


Branch forward Src words 


Int 


BZ 


Rs, Src 


Branch if Data(Rs) = 0 


Int.Int 


BNZ 


Rs, Src 


Branch if Data(Rs) ^ 0 


Int,Int 


BF 


Rs, Src 


Branch if BitO(Rs) = 0 


Bool, Int 


BT 


Rs, Src 


Branch if BitO(Rs) = 1 


Bool.Int 


bnil 


Rs, Src 


Branch if Rs = NIL 


All but CFut,Int 


BNNIL 


Rs, Src 


Branch if Rs ^ NIL 


All but CPut.Int 



In unchecked mode, BNIL and BNNIL compare both the tag and data with NIL. BF and BT inspect only 
the least significant bit of the data portion, while BZ and BNZ compares the full data portion but does not 
check the tag. 



4.7 NETWORK OPERATIONS 



SEND Src, P Send Src at priority P All but CFut 

SENDE Src, P Send Src and terminate All but CFut 

SEND2 Src, Rs, P Send Src then Rs All but CFut 

SEND2E Src, Rs, P Send Src then Rs and terminate All but CFut 

Send one or two words onto the network. When two words are sent, the one from Src is sent before the word 
in Rs; hence, please note the unusual assembler syntax order of Src and Rs. SENDE and SEND2E indicate 
the end of the message to the network hardware after the words they send. SEND and SEND2 set the M 
Flag for the priority to which the message is being sent, while SENDE and SEND2E clear the M Flag. The 
op2 field is used to encode which message priority to send the message on. 



4.8 SPECIAL INSTRUCTIONS 



NOP No Operation 

Used to pad instructions when aligning for the target of a branch instruction or to accommodate a long 
constant. 

SUSPEND Terminate Thread 

Pops the current message from the message queue and schedules the next message. Should not be called 
from the background. 

CALL Src Call system routine Src Int 

Fault using the vector at 128 + (Src mod 64). Src must be an integer unless the U flag is set. 

30 



4.9 NAME CACHE OPERATIONS 



ENTER 
XLATE 
PROBE 



Src, Rs 
Rs, Dst, C 
Src, Rs 



Enter(Src) Into Rs 
Dst «— lookup in Rs 
Rd <- Bool.Src is in Rs 



All but CFut 
All bat CFut 
All but CFut 



The Enter instruction enter Rs and Src into the associative table so that associative Jookup(Src)=Rs. That 
is, Src is the key and Rs is the data. The slot used is picked at random except when associative Jookup( Src) 
already existed, in which case the old value is overwritten. 

Xlate recovers the value entered using the key Rs. An XLATE fault is signalled if no entry was found in 
the table or if the associated data value for Rs is NIL. The constant field C provides a way for the XLATE 
exception code to know what circumstances surrounded the failed translation so it can behave appropriately. 
When XLATE'ing into an address register the key being XLATE'd is written into the corresponding ID 
register. The probe instruction is similar to xlate except it does not fault if Rs in not found. If Rs is there, 
Dst t-Lookup(Rs), else Dst *- NIL. 

INVAL Invalidate address register 

Invalidate all relocatable address registers (ones with the R bit set) on both priority levels by copying the R 
bit into the I bit. 



The current implementation of the Message Driven Processor has a small number of bugs that the low-level 
programmer must be conscious of. The file tt ~mdp/doc/bugs_and_workarounds B indicates the known bugs 
and how to work around them. 



4.10 MDP BUGS 



31 



Chapter 5 

PROGRAMMING EXAMPLES 



In this section we provide several simple example programs written using the assembler that is incorporated 
into MDPSim. We also discuss how to run the assembler and the use of standard include files. 

5.1 THE FORM OP A TYPICAL PROGRAM 

A typical MDP assembly language program is of the following form: 
IICLUDE "/hotta/jn/ndp/include/hv.SKip" 

;;; Define the location and size of the aessage queues 
LABEL qbaO.base = $0200 
LABEL qbsiO.size = $0100 
LABEL qbsil.base = $0300 
LABEL qbml.size = $0080 

; ; ; Define the location and size of the nana cache 
LABEL tba.base = $0380 
LABEL tba.size = $0080 

<aore labels > 

MODULE 

ORG reset.background.ip 

< initialize external ae«ory> 

<initialize nana cache> 

< initialize the network interf ace> 

<application code> 

<aesaage handlers > 

<f ault handlers> 

<call handler s > 

ORG fault _vec_addr_pO 
<priority zero fault vector> 

ORG f ault_vec_addr_pl 
<priority one fault vector> 

ORG ayscall_vec_addr 
<systea call table> 

EBD 



33 



Quite frequently the programmer will use a set of standard fault handlers and associated vectors. The tile 
u "mdp/include/std_flt.i w contains a handler that saves the state of the processor core into memory and then 
idles, and a set of simple stubs that record which fault occurs and then calls this main routine. This state 
can be inspected by utilities iu the monitor. If the user wishes to supply their own handler for certain faults, 
then they can do so explicitly during program initialisation. In this case the program has the following form: 

IICLUDB M /ho««/jB/«dp/lnclud«/hw.i M 

; ; ; Define the location and size of the Basing* queues 
LABEL qbaO.basa = $0200 
LABEL qbaO.sizfl = $0100 
LABEL qbal.base = $0300 
LABEL qbal.size = $0080 

;;; Define tha location and siza of tha nana cache 
LABEL tbn.base = $0380 
LABEL tbn_siza = $0080 

<aora labels > 

MODULE 

ORG reset_background_tp 

<inltializa external aemory> 
<ovarride certain trap handlers> 
<initializa nana cacha> 
<initializa tha network intarf aca> 

<application cod*> 

<aessage handlers > 

< additional fault handlars> 

< additional call handlers> 

IICLUDE M /hona/jn/ndp/includa/atd_f It .i M 

BID 

This program can be loaded directly into the MDP instruction level simulator MDPSim. Alternatively, 
it can be assembled using MDPSim and then loaded into the register transfer level simulator or onto the 
J-Machine. Assembling a program for the J- Machine is accomplished by 

unix> MDPSin -o test .bin taat.ndp 

and then this program can be loaded and run using jmon. The use of MDPSim is fully documented in CVA 
Memo 38 "Message- Driven Processor Simulator". All of the examples provided below are included in the 
directory M ~mdp/examples" so that you can run them for yourself and trace their execution. 

5.2 INITIALIZING THE MDP 

This program demonstrates the steps required to initialise an MDP. It concentrates on these key steps and 
little else. We include the mdp assembly language program itself and a jmon script for running this program. 
Please refer to the jmon manual for more information on jmon scripts. It assumes that the front end has 
placed the node-id for every node in a known location in memory. The program will be downloaded through 
the diagnostic port to all the nodes at one time and then this location will be written on each node in turn. 

This program accesses registers for both P0 and PI from the background. A special syntax is used to 
specify which priority is to be used. In this code, the embedded comments clarify which priority is being 
used. Accessing registers among the priorities is clarified in a later example. 



34 



I I I I I I ) I I 1 I I ( I I I I H I » I I I » I I » M »» M I I I I » I » II » I I I I M I I » I 1 » H I I » I I I I I » I M I » II i I ! 

Michael loakea init-mdp .adp Oct 1, 1901 

Thie program deaonetratee th« key atapa that mist occur to initialize an 
MDP. It ia assumed that tha front-end auppliea the IIR in aeaory. 

; ; i ; ; ; • ; ; ; ; a > ■ ; ;;;•»; ; ; ; ; ; ; ; ; »;;:;;;•;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 
IICLUDE M /hoae/ja/adp/include/hw.i M 

; ; ; Define the location and aize of the neaaage queue* 
LABEL qbnO.base = 10200 
LABEL qbaO.aize = $0100 
LABEL qbal.base = $0300 
LABEL qbal.size = $0080 

; ; ; Define the location and aize of the nana cache 
LABEL tba.base = $0380 
LABEL tba.aize = $0080 

; ; ; Do not change these locations without changing JNOI acripts also 
LABEL Inr Table = $0120 



LABEL lodellR = 0 
LABEL MinllR = 1 
LABEL MazIIR = 2 



Thia node 'a IIR or IIL 
The nin IIR in the cube 
The nin IIR in the cube 



NODULE 

ORG reaet.background.ip 



; initialize the refreah tiner and error counter 


dc 


eai.rtc 


; address of tiner register 


move 


rO, rl 




dc 


eai.rtc.init 


; Initial count 


■ove 


r0, [rl, a0] 


; atora into tiner 


dc 


eai.erc 


; The error count register 


■ove 


0, rl 




■ove 


rl, [rO, aO] 





; ; ; Initialize the QBN and QHL regiatera for P0 

dc ADDR:(qbaO_baee « addr.baae.poa) I 0 

■ove rO, qhl 

dc ADDR: (qbaO.base « addr.baae.poa) I (qbaO.aize - 1) 

■ore rO, qba 

;;; Initialize the QBN and QHL regiaters for PI 

dc ADDR: (qbal.baee « addr.baae.poa) I 0 

■ova rO, qhl 1 

dc ADDR: (qbnl.baae « addr.baae.poa) I (qbnl.aize - 1) 

■ove rO , qbn ( 

; ; ; Initialize the nana cache 

dc ADDR : (tba.beae « tba_baae_poe) I (tba.aize - 1) 

■ove rO, tba 

; ; ; Set the SEID and EARLY fault handler apecially 

dc ADDR: (fault. vec.addr.pO « addr.baae.poa) I fault.vec.len 



35 



move rO, al 

dc ADDR: (fault, vec.addr.pl « addr.base.pos) I f ault_vac_len 

■ove rO, a2 

dc IP:ip_u I ip_f I (RETRY « ip.off set.pos) I ip_aO_absolute 

■ova rO, [fault .sand, al] 

■ova rO, [fault. sand, a2] 



■ova rO, [fault .early, al] 
■ova rO, [fault .early, a2] 



; ; ; Fetch the node-id and store It in the IIR 
dc lodellR 
■ove [rO , aO] , rl 
■ove rl, IIR 



; ; ; Send messages to self to initialize the routers 
dc MSG:(IIITR« addr.base.pos) I 1 

send2e rl, rO, 1 
send2e rl, rO, 0 



;;; Accept interrupts 
■ove 0, rO 
■ove rO, I 



; ; ; Wait for messages to run through the datapath 
■ove 4, rO 
Idlel: sub rO, 1, rO 
bnz rO, * Idlel 



Done : br "Done 



; Message IIITR() ; 

; 

; Sent by a node to initialize its router datapath. ; 

; » 

i i ! ! ! S ! i » i > ! i t ! i ! ! i i S i i ! i i ! ! i m i W u ! i S • ! i > i S S it i ^ 



IIITR: suspend 



; The custom FAULT handlers 

; RETRY: Waste some time while restarting the instruction 



; ; ; Subtract a phase from FIP the sneaky say and retry 

RETRY: move rO, fopO 

move fip, rO 

rot rO, -ip.p.pos , rO 

sub rO, 1, rO 

rot rO, ip_p_pos, rO 

move rO, fip 

move fopO, rO 

ldipr fip 



36 



; Include standard fault vac tors 
IICLUDE "/home/ jm/mdp/include/std_f It . i" 



EID 



Here is & jmon script that executes this program though there is no interesting data generated by running 
this one. The script is included to clarify how to initialise this program. 

/ft***************************************************************/ 

/* ♦/ 
/* Mike loakes inlt-mdp.j Oct 1, 1991 */ 

/* */ 
/* This script initializes and runs a program that demonstrates */ 
/* the basic process required to initialize the HDP. */ 
/• */ 

/********«**********«***«***«************************************/ 



/* Include the standard jmon utilities */ 
include "std.util. j" 

/* The address where the HDP expects to find its I IB */ 
lodellR = $120; 

/* Run the program on a machine of the indicated size */ 
defun InitNdp (zsize ysize zsize) < 

make.mdp xsize ysize zsize; /* Make the machine */ 

select .all; 
reset; 

load "init-mdp.bin"; /* Load the program */ 

/* Select each node in turn and store its IIR */ 
for (i = 0; i < JWODBS; i = i + 1) < 
select i; 

sm lodellR (nodeid_to_nnr i); 

> 

select .all; 

/* Start processor and run cycle until first instruction */ 
run 1 ; 

/* Watch the register file at the current priority */ 
if (_JMI == "RTL H ) 
.watch cur.state ; 



run 30; 
halt; 
> 



InitMdp 111; 
quit; 



5.3 INITIALIZING THE NNRs 

This program example demonstrates the general principles of initialising an MDP and one method for 
assi gnin g node-ids to every node. This program runs on a J- Machine without a network interface to the host 
machine; it relies on the use of several memory locations known to both the program and the jmon interface 
to communicate the configuration of the machine. 



37 



Michael loakes init-nnr.adp Oct 4, 1991 

This program deaonstrates th« initialization of an MDP and implements a 
simple distributed scheae for assigning node-ids to each node. This 
program is loaded onto every node. One processor is selected to be the 
driver node. Every node learns the a in and aax node- ids of the machine 
using aeaory locations known to both the HDP and the jaon front-end. 
A third location indicates whether this node is to be the master. 



IICLUDE "/hone/ja/adp/include/hw.i" 

; ; ; Define the location and size of the message queues 
LIBEL qbaO .base = 10200 

LABEL qbaO.size = $0100 

LABEL qbul.base = $0300 

LABEL qbal.size = $0080 

;;; Define the location and size of the name cache 
LABEL tbm.base = $0380 

LABEL tbm_size = $0080 

; ; ; Do not change these locations without changing JN0I scripts also 
LABEL InrTable = $0120 

LABEL InrTableLength = 4 

LABEL MinllR = 0 ; The min IIR in the cube 

LABEL lodellR = 1 ; This node's IIR or IIL 

LABEL MaxIIR = 2 ; The max I IB in the cube 

LABEL InitHsgs = 3 ; The number of init messages 



NODULE 

ORG reset_background.ip 

; initialize the refresh timer and error counter 



dc 


emi.rtc 


! 


address of timer register 


move 


r0 f rl 






dc 


emi_rtc_init 


S 


initial count 


move 


rO. [rl, aO] 


* 


store into timer 


dc 


eai.erc 


! 


The error count register 


■ove 


0, rl 






move 


rl, CrO, aO] 







; ; ; Initialize the QBM and QKL register! 



dc 


ADDR:(qbmO_base « addr_base.pos) 




0 


■ove 


rO, qhl 






dc 


ADDR: (qbmO.base « addr_base_poi) 


(qbaO.size - 


1) 


move 


rO, qbm 






dc 


ADDR: (qbal.base « addr_base_pos) 




0 


move 


rO, qhl' 






dc 


ADDR: (qbml.base « addr_base_po») 


(qbal.size - 


1) 


■ove 


rO, qbm' 







38 



; ,* ; Initialize the name cache 

dc ADDR: (tbn.base « tbn.base.pos) | (tbm_size - 1) 

move rO, tbm 

; ; ; Set the SKID and EARLY fault handler specially 

dc ADDR: (f ault_vec_addr_pO « addx_ba«e_po») I f ault_vec_len 

move rO, al 

dc ADDR: (fault_vec_addr_pl « addr_baae_po») I f ault_vec_len 

■ove rO, a2 

dc IP:ip_u I ip_f I (RETRY « ip.off set.pos) I ip_aO_ab«olute 

aove rO, [fault .tend, al] 

■ove rO, Cfault.send, a2] 

■ova rO, [fault_oarly, al] 
aove rO, [fault .early, a2] 

; ; ; Accept interrupts 
aove 0, rO 
aove rO, I 

; ; ; Vait until the IIR it aet . This can be done by front-end or aessage 
dc addr: (Inr Table « addrjbase.pos) I InrTableLength 

■ove rO, al 
Vait: aove [lodellR, al] , rl 

bnil rl, -Wait 

; ; ; Store it in the IIR 
■ove rl, IIR 

; ; ; Send Messages to self to initialize the routers 
dc MSG:(IIITR« addr.base.pos) I 1 

send2e rl, rO, 1 
send2e rl, rO, 0 

; ; ; Fetch the Max IIR 
■ove [NaxIIR „ al] , r2 



IIIIMIIMIIIIIMt 



Check the Z-diaension 



; ; ; IF this node "above" the origin node? 
ChkZ: sub rl, [MinllR, al] , rS 
and r3, $3FF, r3 
buz r3, "ChkT 

; ; ; AID not on the top board 
dc int:t7C00 
sub r2, rl, r3 
and r3, rO, r3 
bz r3, "ChkT 

;;j THE! send the SETIIR to the node above 

dc 1 « 10 

add rl, rO, r3 

dc sag: (SETIIR « addr_base_pos) I 4 

send2 r3, rO, 0 

■end2 [MinllR. al] , r3, 0 

sonde r2, 0 

;;:;;;;;;;;;;:;;•; Check the T-diaension : 



39 



; ; ; IF this node in the first column 
ChkT: dc $1F 

sub rl, [MinllR, all. r3 
and r3, rO, r3 
bnz r3 , "ChkX 



; ; ; AID not at the top of the column 
dc int:$03E0 
sub r2, rl, r3 
and r3, rO, r3 
bz r3, "ChkX 



; ; ; THEI send the SETIIR to the node in the positive T-dim 

dc 1 « 6 

add rl, rO, r3 

dc msg: (SETIIR « addr.base_pos) I 4 

send2 r3, rO, 0 

send2 [HinllE, al] , r3. 0 

sende r2, 0 

Check the X-dinension ;»;»»!!»!»»•»!•»»•»!» 



;;; IF this node is not in the last column 
ChkX: dc int:$001F 
sub r2, rl, r3 
and r3, rO, r3 
bz r3» 'Done 



; ; ; THEI send the SETIIR to the node in the positive X-dim 

add rl, 1, r3 

dc msg: (SETIIR « addr.base.pos) I 4 

send2 r3, rO, 0 

send2 [MinllR, al] , r3, 0 

sende r2, 0 



Done: br "Done 



; Message IIITR() 

S 

; Sent by a node to initialize its router datapath. 



IIITR: suspend 



; Message SETIIR (ainnnr nodennr maznnr) ; 

; ; 

; Sent by a neighbor to pass us our IIR. Copy the args into memory ; 



SETIIR: dc addr : (InrTable « addr.base.pos) I InrTableLength 

move rO, al 

; ; ; Copy the arguments from the message into memory 
move [1 , a3] , rO 



40 



move rO, [MinMIR, al] 

move [2, a3] , rO 

nova rO, [lodeSMR, al] 

aove [3 , a3] , rO 

move rO, [MaxIIR, al] 



; ; ; Increaent tha count ar (Zaroad by tha front and) 

■ova [InitHsgs, al] , rO 

add rO, 1, rO 

■ova rO, [InitMsgs, al] 



suspend 



n i • 1 1 • i m 1 1 1 m n • 1 1 1 1 i m i i m i 1 1 1 1 m 1 1 m t n i 1 1 1 1 1 i 1 1 i ! i S S! i i S i ! i ! i S • 

' ; 

; The custom FAULT handlers • 

» ; 

; RETRY: Vaste sons tiae while restarting the instruction. This ; 

; handler Bust disable interrupts while it computes the previous ; 

; IP because it needs to use f opO to store RO and jet can be run ; 

; in the background as veil as at PO. Since background and PO ; 

; share FOPO there is a risk of it getting clobbered. ; 



; ; ; Turn off interrupts without using RO 

RETRY: bt rO, ~ RETRY 1 

zor rO , 1 , rO 

■ove rO t I 

zor rO, 1, rO 

br "RETRY 2 

RETRY1: aove rO, I 

; ; ; Subtract a phase f roa FIP the sneaky way using RO 

RETRY2: aove rO, fopO 

aove fip, rO 

rot rO, -ip_p_pos , rO 

sub rO, 1, rO 

rot rO, ip-P-POs , rO 

aove rO, fip 

aove fopO, rO 



; ; ; Turn interrupts back on without clobbering RO and return 
bt rO, "RETRY 3 

aove rO, I 
ldipr fip 



RETRY3: zor rO, 1, rO 

aove rO, I 

zor rO, 1, rO 

ldipr fip 



; Include standard fault vectors 
IICLODB "/hone/ ja/adp/include/std_f It . i" 



EID 

An interesting point to note about this example is that the retry handler, used to delay in the event of 
a send fault, is considerably more complicated than in the previous example. This handler is more robust 
than the previous and has been included to demonstrate a difficult aspect of taking faults in the background. 
Most trap handlers need to free one or two of the general registers for scratch use and so there must be 



41 



copied somewhere safe. One popular place is the fopO and fopl registers if they are available and another 
is the low slots in the priority switchable memory since they can be accessed directly. Unfortunately, the 
background and priority sero share both of these areas assuming that the P-bit is still set to 0. This is a 
potential risk when working in the background because it is generally possible that a dispatch to priority 
sero might occur and then that message might also take a fault. 

One could take the stance that the background should be used very carefully and should not generate 
faults. This is perhaps a good goal, and certainly this example could be written that way, but it is very 
tempting to use background and it is often quite natural to do interesting things there. This retry handler 
demonstrates a very neat trick for disabling interrupts before saving R0 without affecting R0. It relies on the 
observation that the sense of a boolean flag, i.e. the I-bit, depends only on the low bit of the value written 
to it. If R0 already contains a 1 in the least-significant bit then we need just write R0 to I. If it contains 0 in 
the lsb, then we can reversibly flip it to 1 using the XOR instruction. A similar process enables interrupts 
at the end of the handler. Of course, this handler must run in unchecked mode. 

Here is a jmon script to run this application on the J-Machine hardware. It prepares the global flags and 
verifies that the correct values NNRs were installed. 



/* */ 
/• Mile loakes init-nnr.j Oct 4, 1901 */ 

/* */ 
/* This script initializes and runs a program that demonstrates */ 
/* a simple technique to distribute lifts. */ 
/* •/ 



/♦♦**♦****»»**********♦♦***♦*♦♦**********************************/ 

/* Include the standard jmon utilities */ 
include "std_util.j" 

/* The address where the MDP expects to find its II* */ 
MinllR = 1120; 
lodem = $121; 
MaxIIR = $122; 
InitMsgs = $123; 

/* Run the program on a machine of the indicated size */ 
defun InitfIR (zsize ysize zsize) { 

make.mdp xsize ysize zsize; /* Make the machine */ 

select _all; 
reset ; 

load "init-nnr.bin"; /* Load the program */ 



sm MinllR 0:0; /* Set to IIL */ 

■m lodellR 0:0; /* Set to IIL */ 

■m MaxIIR 0:0; /* Set to IIL */ 

■a InitMsgs 1:0; /* Set to zero */ 



/* Set node 0 specially */ 
select 0; 

■a MinllR (nodeid_to_nnr 0); 

sm lodellR (nodeid.tojanr 0); 

sm Maxin (nodeid_to_nnr .MODES - 1); 

■a InitMsgs 1:1; 

/* Run */ 
select .all; 
run 10000; 
halt; 

/* Check the answers */ 



42 



for (i = 0; i < .MODES; i = i + 1) { 
select i; 



if ((nodeid_to_nnr i) != (rnem lodellR)) 

printf "lode %2d has the wrong IIR: <Xd Xd Xd> rather than <Xd Xd Xd>\n" 
i 

(Ilh (men lodellR)) 
(IIRy (raea lodellR)) 
(IIRz (raea lodellR)) 

(IIRz (nodeid.to.nnr i)) 
(IIRy (nodeid.to.nnr i)) 
(IIRz (nodeid_to_nnr i)); 

if (:?(raea InitMsgs) != 1) 

printf "lode X2d received Xd init aessages, not 1 as expected\n" 
i 

:?(raea Init Msgs ) ; 

> 

> 

5.4 ACCESSING OTHER PRIORITIES 

One issue thai often causes people to stumble is the appropriate syntax for accessing registers in each priority 
from each priority. The following code demonstrates the use of the 'b* and ** syntax for describing accesses 
to registers at other priorities as well as the use of the priority-switchable memory. 

In this program the registers R3, QHL, and relative addresses 0 and $40 are accessed from each priority 
using all possible combinations of the 'b' and *'* suffixes. Specifying a suffix of 'b' sets the background flag 
and specifying *" sets the priority flag. If the selected register is implemented at all three priorities, as R3 is, 
then the background flag is XORed with the processor's B-bit. If the result is 1 then the background register 
is selected and the priority flag is ignored. If the result is 0 then the priority flag is used. If it is 0 then the 
register should be taken from the priority indicated by the processor's P-bit. A priority flag of 1 selects the 
opposite priority. If the register does not occur in the background, e.g. QHL, then the background flag is 
ignored and the priority flag selects the priority relative to the P-bit. 

This program does two things that are unusual and should normally be avoided. Most seriously, it 
executes a small number of instructions in the background with P set to 1. We define the background to 
be B = 1 and P = 0 for the reason demonstrated by this code; the P bit will affect the register selected in 
some cases and code would be hard to understand if programmers ignore this convention. It also switches 
priority by explicitly modifying the B and P flags rather than by sending messages. This was done to help 
concentrate on the effects of altering the various flags. It is not recommended practice for most programs 
but the effects are interesting and it is conceivable that a user might have a purpose for doing this. Note 
carefully that it is necessary to prepare the instruction pointer before changing the flags. 



; Michael loakes pri.adp Oct 1, 1991 ; 

* • 

» i 

; This program demonstrates the use of register node addressing to select ; 

; registers in other priorities. It also deaonst rates the ability to switch ; 

; priorities without sending aessages by aanipulating the B and P bit ; 

; explicitly. ; 



IICLUDE "/home/jn/ndp/include/hw.i" 
MODULE 

ORG reset_background_ip 



43 



; ; ; Place known values into each of the registers 



move 


1, rO 


move 


rO, r3 


move 


2, rO 


move 


rO, r3b 


move 


3, rO 


move 


rO, r3b' 


dc 


addr:($400 « 10) 


move 


rO, qhl 


dc 


addr:($500 « 10) 


move 


rO, qhl* 


move 


0, rO 


move 


rO, CrO, aO] 


dc 


$40 


move 


rO, [rO, aO] 



; ; ; From Background B = 1 P = 0 10-Absolute 



testO: readr 


r3. 


rO ; 


Fetch R3BK Use B, ignore P 


xeadr 


r3b, 


rO ; 


Fetch R3P0 Flip B, current 


readr 


r3b', 


rO j 


Fetch R3P1 Flip B, other P 


readr 


r3\ 


rO ; 


Fetch R3BK Use B, ignore P 


readr 


qhl. 


rO ; 


Fetch QHLPO. Current P 


readr 


qhlb, 


rO ; 


Fetch QHLPO. Current P 


readr 


qhl*. 


rO 5 


Fetch QHLP1. Other P 


readr 


qhlb<, 


rO j 


Fetch QHLP1. Other P 


move 


0, rO 






move 


[rO, aO], rl , 


Fetch physical address 0 


dc 


$40 






move 


[rO, aO], rl , 


Fetch physical address $40 


; ; ; Switch to the background with P = 1 . 


; ; ; This is not recommended practice 


move 


1, rO 






move 


rO f P 






testl : readr 


r3. 


rO 


; Fetch R3BK Use B. ignore P 


readr 


r3b f 


rO 


; Fetch R3P1 Flip B. current 


readr 


r3b«, 


rO 


; Fetch R3P0 Flip B. other P 


readr 


r3«. 


rO 


; Fetch B3BX Use B, ignore P 


readr 


qhl. 


rO 


; Fetch QHLP1 . Current P 


readr 


qhlb, 


rO 


; Fetch QHLP1. Current P 


readr 


qhl*. 


rO 


; Fetch QHLPO. Other P 


readr 


qhlb 4 , 


rO 


; Fetch QHLPO. Other P 


move 


0, rO 






move 


[rO. aO] , rl 


; Fetch physical address $40 


dc 


$40 






move 


CrO, aO], rl 


; Fetch physical address $0 



;;; Switch to Priority 0 explicitly. B = 0, P - 0. AO- Absolute 
dc IP:ip_u I (test2 « ip_oif set_pos) I ip_aO_absolute 

move rO, ipb f 
move 0, rO 
move rO, P 
move rO, B 



44 



This code runs at priority zero 



readr 


r3, 


rO 


; Fetch R3P0 Current P 


readx 


r3b, 


rO 


i rflica boor rxxp J>, ignore F 


readr 


r3b' , 


rO 


; Fetch R3BK Flip B, ignore P 


readr 


r3\ 


rO 


; Fetch R3P1 Other P 


readr 


qhl, 


rO 


Fetch QHLPO. Current P 


readr 


qhlb. 


rO 


r*fr|i OUT PO riirrMt D 


readr 


qhl«, 


rO j 


Fetch QHLP1 . Other P 


readr 


qhlb' 


. rO 


Fetch QHLP1. Other P 


move 


0, rO 






move 


CrO, 


•0], rl j 


Fetch physical address $0 


dc 


$40 






move 


CrO, 


10], rl ; 


Fetch physical address $40 



;;; Switch to Priority 1 explicitly. B = 0, P = 1. AO-Absolute 
dc IP:ip_u I (tests « ip_offset_pos) I ip.aO.absolute 

nove rO, ip* 
aove 1, rO 
move rO, P 



This code runs at priority onm 



test3: 



readr 
readr 
readr 
readr 



r3. 
r3b, 
r3b« 
r3< 



rO 
rO 
rO 
rO 



Fetch E3P1 Current P 

Fetch R3BK Flip B, ignore P 

Fetch R3BK Flip B, ignore P 

Fetch B3P0 Other P 



readr 
readr 
readr 
readr 

nove 
■ove 

dc 

■ove 



qhl, rO 

qhlb, rO 

qhl 1 , rO 

qhlb ( , rO 

0, rO 

CrO, aO], rl 
$40 

[rO, aO], rl 



Fetch QHLP1. Current P 

Fetch QHLP1. Current P 

Fetch QHLPO. Other P 

Fetch QHLPO. Other P 



; Fetch physical address $40 
; Fetch physical address $0 



idle: 



; ; ; All access done 
br "idle 



BID 

This program was assembled by typing 
uniz> MDPSin -o pri .bin pri.mdp 

. at the command line while in the examples directory. There is also a jmon script file that runs this 
example while "watching" the register file under the regbter-transfer-level simulator. The RTL simulator is 
too slow to be valuable for general program development but it is quite adequate for this program and it 
does do a reasonable job of describing the register file. Here is the jmon script that can be used to run this 
experiment: 

/*****»*****»****»**•♦♦*♦*********♦***♦***♦*******♦*******•******/ 

/* */ 
/* Mike loakes pri.j Oct 1, 1991 */ 

/• */ 
/* A siaple jmon script to run a program that demonstrates the •/ 
/* HDPSia assenbler syntax for accessing priorities. This •/ 
/* script relies on special commands in the RTL simulator to */ 
/• view the register file. */ 



45 



I* ./ 

/* Load the program */ 
load "pri.bin"; 

/* Ut« low-level commands 
if (_JMI == "RTL ") { 

.reset 

.dec 

■watch all_ state 
.execute until 206 

> 

/* Exit JNOI */ 
quit; 

It can be executed by doing 
unix> rtl -r pri.j > pri.log 

which explicitly requests the RTL simulator when running jmon and which will dump the output to the 
file pri.log. You will want to widen your emacs a little compared to the typical default to view this file. 



built into the RTL simulator */ 

/* Reset the MDP and issue "GO" */ 

/* Input base is decimal */ 

/* Watch the entire register file */ 

/* Execute until cycle 205 */ 



5.5 LONG JUMPS 

It was noted that the range available for immediate branches of the form 
bt rO, "Target 

is -64 to +63 words. This span is sufficient for typical application code especially in an environment in 
which we expect the total method length to be on this order, but is occasionally inadequate. There are two 
standard methods for making jumps to target to targets further away than this; compute the desired value 
for the instruction pointer and load it directly, or compute the relative distance into a register and branch 
with this amount. 

;;; Compute the new IP. Run in checked mode and aO.absolute 
dc ip: (Target « ip_offset_pos) I ip_aO_absolnte 

ldip rO 

; ; ; Compute the branch distance 
dc Target - (♦ + 2) 

br rO 

<code> 
Target: <more code> 

There is little to chose between these approaches. The principle disadvantage to the former technique is 
that the user is then responsible for preserving the various flags that are stored in the IP. Of course, this is 
a good solution if you wish to explicitly alter these flags before running the next block. The second solution 
avoids this issue but is, perhaps, a little quirkier. The '*' is a pseudo-variable maintained by the MDPSim 
assembler that contains the value of the current code offset. The factor of 2 is added to compensate for the 
instruction prefetch mechanism of the MDP. 



46 



Chapter 6 

Appendix A 



This appendix collects the key tables in one place. It shows the format of the 13 defined types, the format 
of the register file, summarises the instruction set of the processor, and the set and priority of the 19 denned 
faults. 



3 3 3 2 
2 10 9 



9 S 7 



1 0 



0 0 0 0 


Value(0 = NIL) 


0 0 0 1 


Two's Complement Value 


0 0 0 1 


0 0 b 


0 0 11 


r 


l 


base 


length 


0 10 0 


u 


{ 


offset 


p|a 0 0 


0 10 1 


u 


{ 


offset 


length 


0 110 


user-defined 


0 111 


user-defined 


10 0 0 


user-defined 


10 0 1 


user-defined 


10 10 


user-defined 


10 11 


user-defined 


1 1 


0 0 first instruction 


second instruction 


1 1 


0 1 first instruction 


second instruction 


1 1 


1 0 first instruction 


second instruction 


1 1 


1 1 first instruction 


second instruction 



SYM 

INT 

BOOL 

ADDR 

IP 

MSG 

CPUT 

FUT 

TAG8 

TAG9 

TAGA 

TAGB 

INST0 

insti 

INST2 
INST3 



47 



Register Set 



3 3 3 2 
2 10 9 



1 1 
7 6 



0 9 8 7 



tag 


R0 - R3 
data 


All 


0 0 11 


«M 


base 


AO- A3 


! length 




All 


0 10 0 


«|f| 


offset 


IP 


|p|a|0 


0 


All 



0 10 0 



offset 



tag 



FOP0 - FOP1 

data 
FIR 



1 1 
0 0 



instruction 



3 3 


3 


11 1 


5 2 


1 


6 5 0 9 5 4 0 






NNR 


0 0 0 1 


0 


0 1 sdim | ydim | xdim 



Global 



3 3 3 3 2 1 

52109 09 0 



0 0 11 


QBM 

z Id 1 base 1 mask 


0 0 11 


<iHt 

x |x | head j length 




0 0 11 


TBM 

x |x 1 base f mask 


tag 


to0-ID3 
data 



P0 PI 
P0 PI 

Global 
PO PI 



3 


3 


3 


2 


1 








5 


2 


1 


0 


9 






0 












MAR 






0 0 0 


1 


0 


0 


1 




memory address 





Global 



5 2 


1 




10 


0 0 10 


0 


Status Flag S € { I* B, P, U, F, Q, M } 


0|S 



I: 


Interrupt Mask 


0: 


Interrupts Allowed 


i: 


Interrupts Disabled 


B: 


Background Execution 


0: 


Message 


1: 


Background 


P: 


Priority Level 


0: 


Level 0 


1: 


Level 1 


U: 


Unchecked Mode 


0: 


Checked 


1: 


Unchecked 


F: 


Faults Disabled 


0: 


Normal 


1: 


Faults Disallowed 


Q: 


Queue Wrap Around 


0: 


A3 Points Into Memory 


1: 


A3 Points Into Queue 


M: 


Message Flag 


0: 


Message Send Complete 


1: 


Message Being Sent 



48 



General Movement and Type Instructions 


Source Tvnei 


READ 


Src, Rd 


Rd «— Src. 


All but CFut 


READR 


Src, Rd 


Rd «— Src. 


All but CFut 


WRITE 


Rs, Dst 


Dst «— Rs. 


All 


WRITER 


Rs, Dst 


Dst «- Rs. 


All 


LDIP 


Src 


IP —Src. 


IP 


LDIPR 


Sic 


IP «- Src. 


IP 


CHECK 


Rs, Src, Rd 


Rd — BOOL-.tag(Rs) == Src 


All,Int 


RTAG 


Src, Rd 


Rd — INT:tag(Src) 


All but CFut 


WTAG 


Rs, Src, Rd 


Rd «- Src:Rs 


All,Int 



Arithmetic and Logical Instructions 



ADD 


Rs. Src Rd 

*Wj VlV| SUl 


Rd 




Rs + Src 


Tnt Tut 


CARRY 


Rs. Src. Rd 


Rd 




CarrWR* 4- Sre^ 


Tnt Tnt 


SUB 


Rs. Src Rd 

*wj wi^j *^\a 


Rd 




Rs - Src 


Tnt Tnt 

111*, 111* 


NEG 


Rs, Src 


Rd 




-Src 


Tnt 


MUL 


Rs Src Rd 


Rd 




Iv, w 39 Kit« nf P«*Src 


Tnt Tnt 


MULH 


Rs Src Rd 


Rd 
nu 




Hiffh 32 bit* nf R«*Src 


Tnt Tnt 

in*, in* 


ASH 


Rs Src Rd 

IWj av*U 


Rd 




Rc <'<' Src f arithmetic I 


Tnt Tnt 


LSH 


Ra Srr 

XVO| WSTV] xcvj. 


Rd 

AMI 




n« <»<» Cm /ln<ric«Jl 


Tnt Tnt 
111%, lit* 


ROT 


Rs, Src, Rd 


Rd 




Rs rotate left Src 


Int,Int 


FFB 


Src, Rd 


Rd 


i — 


Find First Bit 


Int 


AND 


Rs, Src, Rd 


Rd 




Rs AND Src 


Int,Int Bool, Bool 


NOT 


Src, Rd 


Rd 




NOT Src 


Int Bool 


OR 


Rs, Src, Rd 


Rd 




Rs OR Src 


Int, Int Bool, Bool 


XOR 


Rs, Src, Rd 


Rd 




Rs XOR Src 


Int, Int Bool,Bool 


GE 


Rs, Src, Rd 


Rd 




Bool:Rs > Src 


Int.Int Bool,Bool 


GT 


Rs, Src, Rd 


Rd 




Bool:Rs > Src 


Int, Int Bool, Bool 


LE 


Rs, Src, Rd 


Rd 




BooI:Rs < Src 


Int,Int Bool, Bool 


LT 


Rs, Src, Rd 


Rd 




BoohRs < Src 


Int, Int Bool, Bool 


EQUAL 


Rs, Src, Rd 


Rd 


«— 


BoohRs = Src (Data) 


Int, Int Bool, Bool Sym,Sym 


NEQUAL 


Rs, Src, Rd 


Rd 




BoohRs ^ Src (Data) 


Int, Int Bool, Bool Sym,Sym 


EQ 


Rs, Src, Rd 


Rd 




BoohRs = Src (Pointer) 


All but CFut or Fut 


NEQ 


Rs, Src, Rd 


Rd 




BoohRs ^ Src (Pointer) 


All but CFut or Fut 



Branches 








BR 


Src 


Branch forward Src words 


Int 


BZ 


Rs, Src 


Branch if Data(Rs) = 0 


Int.Int 


BNZ 


Rs, Src 


Branch if Data(Rs) ^ 0 


Int.Int 


BF 


Rs, Src 


Branch if BitO(Rs) = 0 


Bool,Int 


BT 


Rs, Src 


Branch if BitO(Rs) = 1 


Bool, Int 


BNIL 


Rs, Src 


Branch if Rs = NIL 


All but CFut,Int 


BNNIL 


Rs, Src 


Branch if Rs ^ NIL 


All but CFut,Int 



Network Instructions 

SEND Src, P 

SENDE Src, P 

SEND2 Src, Rs, P 

SEND2E Src, Rs, P 



Send Src at priority P All but CFut 

Send Src and terminate All but CFut 

Send Src then Rs All but CFut 

Send Src then Rs and terminate All but CFut 



Special Instructions 

NOP 

SUSPEND 

CALL Src 



No Operation 
Terminate Thread 
Call system routine Src 



Int 



Associative Lookup Table Instructions 

ENTER Src, Rs Enter(Src) Into Rs AH but CFut 

XLATE Rs, Dst, C Dst <- lookup in Rs AD but CFut 

PROBE Src, Rs Rd <- BoohSrc is in Rs All but CFut 

INVAL Invalidate address register 



49 



Name 


Args 


MAD 




Kb AD 


arc, Ka 


\XTXi TT'T 
YVK11 £j 


T"> Ti** 

Ka, Dst 


np A T\T> 

KMUtt 


ore, Ka 


WKIlEiK 


Ks, Dst 


HIAVjr 


C T> J 

arc, Ka 


W 1 AU 


Ks, ore, Ka 


T TkTD 

LDlr 


ore 


LDIPR 


Rs, Dst 


CHECK 


Rs, Src, Rd 


pin t>"V 
v^AKKi 


Da DJ 

Ks, ore, Ka 


ADD 


Da Cm DJ 

Ks, arc, Ka 




Ks, arc, Ka 


If TTT TJ 


D- DJ 

Ks, arc, Ka 


If TTT 

MUL 


r> _ o DJ 

Ks, Sic, Ka 


A GTJ 

Aon 


Da C.~ DJ 

Ks, arc, Ka 


T OTJ 


Ks, arc, Kd 


KU1 


D_ C-.. DJ 

Ks, arc, Ka 


AND 


T> - c DJ 

Ks, arc, Ka 


UK 


D_ C.» DJ 

Ks, arc, Ka 


TAD 

AUK 


D- C-_ DJ 

Ks, arc, Ka 


WD 

1 1 a 


C_«» D J 

arc, Ka 




arc, Ka 


MB/1 


Ks, arc 


LI 


Da Cm DJ 

Ks, arc, na 


T 1? 


D- C. DJ 

Ks, arc, Ka 


/-IB 

KmTj 


D- C— DJ 

Ks, arc, Ka 


V»I 


Da Cm DJ 

ks, arc, Ka 




Da C,» DJ 

ks, arc, Ka 


X1"P/™VTT A T 


Da Cm* DJ 

Ks, ate, Ka 


EQ 


Rs, Src, Rd 


NEQ 


Rs, Src, Rd 


aLAIi/ 


Ks, Dst, L> 


hri ILK 


arc, Ks 


INVAL 




PROBE 


Src, Rs 


SUSPEND 




CALL 


Src 


aLND 


C D 

arc, r 




C — D 

arc, F 


SEND2 


Src, Rs, P 


SEND2E 


Src, Rs, P 


nn 

BR 


Sic 


BNIL 


Rs, Src 


T\ XT XT T T 

BNNIL 


Rs, Src 


Br 


Rs, Src 


BT 


D_ o 

Ks, arc 


DA 


Da Cm 

its, arc 


BNZ 


Rs, Src 



Encoding 



Faults 



uuuuuu 


00 


fin 


uuuuuuu 


AAAaai 
UUUUU1 


Ka 


1 


Cm 

arc 


nnnni a 
UUUU1U 


i 


Da 

KS 


Dst 


000011 


Kd 


AA 
00 


arc 


AAA1 A A 

000100 


Art 

00 


KS 


Dst 


AAA, ft1 

000101 


D J 

Ka 


1 


arc 


000110 


ft/1 
Ka 


Da 
KS 


Cm 

arc 


nflfti 1 1 

UUUlll 


i 


uu 


arc 


001000 


00 


00 


Src 


001001 


Rd 


ks 


Src 



Cfut 



SI/.-. A1 /> 

001010 


Rd 


KS 


C» 

Src 


AAJ All 

001011 


D J 

Rd 


D- 

KS 


Src 


001100 


Rd 


Rs 


y 

arc 


001110 


ft 1 
Rd 


Rs 


Src 


Sim 4 

001111 


Rd 


Rs 


Src 


010000 


Rd 


Rs 


Src 


010001 


Rd 


Rs 


c* 

Src 


010010 


Rd 


Rs 


Src 


A1 1 AAA 

011000 


Kd 


D- 

KS 


■ cn — 
arc 


011001 


Rd 


ks 


Src 


011010 


fid 


Rs 


Src 


011011 


Rd 


i 


Src 


011100 


Rd 


ks 


Src 


011101 


Rd 


ks 


Src 


100000 


Rd 


Rs 


Src 


100001 


Rd 


ks 


Src 


100010 


Rd 


ks 


Src 


100011 


Rd 


Rs 


Src 


100100 


Rd 


ks 


Src 


100101 


Rd 


ks 


Src 


100110 


Rd 


ks 


Src 


100111 


kd 


ks 


Src 



101000 


C 


ks 


Dst 


101001 


00 


ks 


Src 


101010 


00 


00 


0000000 


101101 


00 


Rs 


Dst 



110000 


00 


00 


0000000 


110001 


1 


00 


Src 



110100 


p 


1 


Src 


110101 


p 


1 


Src 


110110 


p 


Rs 


Src 


110111 


p 


Rs 


Src 



111000 


i 


00 


Src 


111010 


i 


Rs 


Src 


111011 


l 


Rs 


Src 


111100 


i 
l 


Rs 


Src 


111101 


i 


Rs 


Src 


111110 


i 


Rs 


Src 


111111 


i 


Rs 


Src 



Cfut 

Type, Cfut, Put, Tag* 
Cfut 

Type, Cfut, Put, Tag* 
Type, Cfut, Put, Tag* 
Type, Cfut, Put, Tag* 
Type, Cfut, Put, Tag* 

Type, Cfot, Put, Tag*, 
Type, Cfut, Put, Tag*, 
Type, Cfut, Fut, Tag*, 
Type, Cfut, Fut, Tag*, 
Type, Cfut, Put, Tag*, 
Type, Cfut, Put, Tag*, 
Type, Cfut, Put, Tag*, 
Type, Cfut, Put, Tag* 
Type, Cfut, Fut, Tag* 
Type, Cfut, Put, Tag* 
Type, Cfut, Fut, Tag* 
Type, Cfut, Put, Tag* 
Type, Cfut, Put, Tag* 
Type, Cfut, Fut, Tag*, 
Type, Cfut, Fut, Tag* 
Type, Cfot, Put, Tag* 
Type, Cfut, Fut, Tag* 
Type, Cfut, Fut, Tag* 
Type, Cfut, Fut, Tag* 
Type, Cfut, Put, Tag* 
Cfut, Fut 
Cfut, Fut 

Cfut, Xlate 
Cfut 

Cfut 

Early 

Type, Cfut, Fut, Tag* 

Cfut, Send 
Cfut, Send 
Cfut, Send 
Cfut, Send 



Overflow 
Overflow 
Overflow 
Overflow 
Overflow 
Overflow 
Overflow 



Overflow 



Type, Cfut, 
Type, Cfut, 
Type, Cfut, 
Type, Cfut, 
Type, Cfut, 
Type, Cfut, 
Type, Cfut, 



Fut, Tag* 
Put, Tag* 
Put, Tag* 
Fut, Tag* 
Fut, Tag* 
Fut, Tag* 
Fut, Tag* 



50 



Njltti# 


Number 


DescriDtion 


CATASTROPHE 


$0 


Double fault t bad vector, or other catastrophe. 


INTERRUPT 


$1 


InterruDt si small ed bv di&moatic interface 


Vc \J Mid V AW 


$2 




SEND 


13 


Send buffer fall. 

WJa^pA A»aA*« 


ILGINST 

liiVAii k* .av 


14 


flleiral imtrTiction 


DRAMERR 


$5 


Double bit error in the external RAM. 


TNVADR 

All T Ay XV 




A tt«mnt to aceess data thronah bHHtmi rMifitt with I hit a*t 


ADRTYPE 


17 
• • 


The addreas index is not an integer. 


LIMIT 


18 


Attemot to access obiect data oast limit. 


EARLY 


$9 


Attemot to data in mnnirr anene tufoM it arrived 


MSG 


tA 


Rad m««««M header 


JVJjAA AV 


SB 


XT, ATE miu^d 


OVERFLOW 


IC 


Integer arithmetic overflow. 


CFUT 


ID 


Attemoted o Deration on a word, tagged CFUT. 


FUT 


IE 


Attempted operation on a word tagged FUT. 


TAGS 


IF 


Attemoted operation on a word taaaed TAGS 


TAG9 


110 


Attemoted operation on a word taaaed TAG 9. 

iiviviiivfw wa#v4» v aavai w w vtw wba&w a *n»x* w « 


TAGA 


111 


Attempted operation on a word tagged TAGA. 


TAGB 


112 


Attemoted ooeration on a word taaaed TAGB 


TVPE 


lis 

#A* 


fYn*MhTi/l 1 > 1 with m. VkM/1 t*4P tviw imimI in inaA*ii<.Ai Am 
v/pciAau^>^ wiia m uwi **a vF* ukq ui an iniuuvuou< 




I14-I1F 


Referred for future faults. 



Note: If multiple fault* occur simultaneously the fault vector chosen is the one that has the highest 
precedence. Each fault is assigned a precedence by its fault number; lower fault numbers correspond to 
higher precedence. 



51 



